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pH ! The name Schur is associated with many terms and concepts that are widely used in 

^ I a number of diverse fields of mathematics and engineering. This survey article focuses 

on Schur's work in analysis. Here too, Schur's name is commonplace: The Schur test 
^ and Schur- Hadamard multipliers (in the study of estimates for Hermitian forms), Schur 

convexity, Schur complements, Schur's results in summation theory for sequences (in 
^ I particular, the fundamental Kojima-Schur theorem), the Schur-Cohn test, the Schur al- 

^ I gorithm, Schur parameters and the Schur interpolation problem for functions that are 

holomorphic and bounded by one in the unit disk. In this survey, we shall discuss all of 
the above mentioned topics and then some, as well as some of the generalizations that 
they inspired. There are nine sections of text, each of which is devoted to a separate 
theme based on Schur's work. Each of these sections has an independent bibliography. 
There is very little overlap. A tenth section presents a list of the papers of Schur that 
>■ ' focus on topics that are commonly considered to be analysis. We shall begin with a review 

^ ■ of Schur's less familiar papers on the theory of commuting differential operators. 

OO 
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O 

^ ■ 1 . Permutable differential operators and fractional powers of 

differential operators. 

•rH 

c5 ■ Let 

d'^y d^~^y 
^(y) =Pn(a;)— +p„_i(x)^^^ + ■■■ +PQ{x)y (1.1) 

and 

d'^y d'^'^y 
Q{y) = ^™(^)^ + gm-i(3:) ^^^_^ + ■ ■ ■ + qo{x)y, (1.2) 
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be formal differential operators, where n > and m > are integers, and pk{x) and qk{x) 
are complex valued functions. Then Q commutes with P if {PQ){y) = {QP){y). ( It is 
assumed that the coefficients pk, qi are are smooth enough, say infinitely differentiable, so 
that the product of the two differential expressions is defined according to the usual rule for 
differentiating a product. The commutativity PQ — QP = means that the appropriate 
differential expressions, that are constructed from the coefficients pk, qi according to the 
usual rules for differentiating a product, vanish.) 

In |Schlj Schur proved the following result: Let P , Qi and Q2 he differential operators 
of tie form (11. ip and ( \1.2\\ . Assume tliat eaci of tie operators Qi and Q2 commutes witli 
P: PQi = QiP and PQ2 = Q2P ■ Tlien tlie operators Qi and Q2 commute witli each, 
otlier: Q1Q2 = Q2Qi- 

This result of Schur was forgotten and was rediscovered by S. Amitsur ( |Amij . Theorem 
1) and by I.M. Krichever ( |Kriij . Corollary 1 of Theorem 1.2). (Amitsur does not mention 
the result of Schur, and Krichever does not mention either the result of Schur, or the result 
of Amitsur in [Krilj . but does refer to Amitsur in a subsequent paper jKri2j . 

The method used by Schur to obtain this result is not less interesting than the result 
itself. In modern language, Schur developed the calculus of formal pseudodifferential 
operators in |Schlj : for every integer n (positive, negative or zero), Schur considers the 
formal differential "Laurent" series of the form 

F= (1.3) 

— cxD <fc<ri 
fc an integer 

where the coefficients fk{x), —00 < k < n, are smooth complex-valued functions of x 
and D = (He does not discuss the existence of an operator in a space of functions 
that corresponds to this formal series.) The sum of two formal "Laurent" series and the 
product of such a series and a complex constant are defined in the usual way. To define 
the product F o G oi two such series F and 

G= J2 9i{x)D\ (1.4) 

— 00 < Z < m 
I an integer 

one needs a rule for commuting powers of the operator D with powers of the operator of 
multiplication by the function a{x). This rule is defined by the formulas 

Da = a{x)D + a'{x)I, 

and 

D-^a = a{x)D-^ - a'{x)D-^ + a"{x)D-^ + ■■■ + {-lf-^a^^-^\x)D-'' + ■ ■ • , 

where a'{x), a"{x), . . .a^''~^\x), . . . are the derivatives of the function a{x) of the in- 
dicated order. The set of all formal differential "Laurent" series provided with such 
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operations becomes an associative (but not commutative) ring over the field of complex 
numbers. If the function fn{x) is invertible (in which case we can and will assume that 
fn{x) = 1), then the formal Laurent series (11.31) is invertible, and its inverse is of the form 

H= Yl hi{x)D\ (1.5) 

— oo</<— n 

where h-n{x) = 1, and the coefficients hk{x) are polynomials in the functions fk{x), k < 
n, and their derivatives. 

In particular, a differential operator P of the form (II. ip may be considered as a formal 
"Laurent" series (11.31) whose "positive" part -F+ = ^ fk{x)D^ coincides with P and 

0<fc<n 

whose "negative" part F_ = ^ fk{x)D^ vanishes. In |Schl] . Schur proved that if each 

-oo<fc<0 

of two formal differential Laurent series Fi and F2 commutes witli a differential operator 
P of tlie form (11.11) : P o Fi = Fi o P and P o F2 = F20 P, then Fi and F2 commute 
with each other : Fi o F2 = F20 Fi. In particular, this result is applicable to polynomial 
differential operators Qi and Q2 of the form (11.21) commuting with P. (Qi and Q2 are 
considered as differential formal Laurent series whose "negative" parts are equal to zero.) 
Schur gives an explicit description of the commutant of the differential operator P (and, 
even more generally, the description of the commutant of any formal differential Laurent 
series). The notion of the fractional power P^/" of the differential operator P is involved in 
this description. 

Let n > and let F be a formal differential Laurent series of the form (II. 3p . The formal 
differential Laurent series F^/" is defined as the formal differential series R for which the 
equality 

RoRo^--- oR = F (1.6) 

n times 

holds. In [Schl] it is proved that if the function (/„(x))^/" exists (in which case we can 
and will assume that fn{x) = 1), then such a series R = F^/" exists and is of the form 

R= J2 r^{x)DP, (1.7) 

— 00 <p< 1 
p an integer 

where ri{x) = 1. The coefficients ro(a;), r_i(a;), r_2(x), ... can be determined in a re- 
cursive manner as polynomials of the functions /„_i(x), fn-2{x), . . . , fo{x), ... and their 

derivatives. The differential Laurent series an integer) is defined as F*-'/" = 

For example, if 

L = D^ + q{x)I (1.8) 
is a Sturm-Liouville differential operator of second order, then 

= D + so{x)I + s_i(x)D"i + s_2{x)D-^ + s_3(x)D-3 + s.^{x)D-^ + ■ ■ ■ , (1.9) 
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where 



so(x) = 0, = s^2{x) 

s-4{x) = - j^ q"'{x) + I q{x)q'{x), 

Furthermore, L^/^ = (L^/^)^ = L ■ L^/^ = L^/^ ■ L, and we can calculate 

^3/2 = ^ t2(a;)£'^ + ti{x)D + to(a;)/ + t-i(x)L>"^ + ■ • • , (l-H) 

where 

t2{x)=0, ti(x) = ^g(x), to(a;) = t-i(x) = ^ g"(a;) + ^ g^^x), . . . (1.12) 

In |Schl] it is proved that the formal differential Laurent series F commutes with a 
differential operator P of the form (11.11) (of order n ) if and only if F is of the form 

— oo<fc<n 

where Ck are complex constants (ck do not depend on x). In particular, some series of 
the form (11.131) can in fact be differential operators (if by some very special choice of the 
Cfc the negative part F_ of the series (11.131) vanishes). These and only these differential 
operators commute with P. Moreover, it is clear that series of the form (I1.13P commute 
with each other. 

The results of Schur on fractional powers of differential operators were forgotten. The 
resurgence of interest in this topic is related to the inverse scattering method for solving 
non-linear evolution equations. The inverse scattering method was discovered and applied 
to the Korteweg - de Vries equation by C. Gardner, J. Green, M. Kruskal and R. Miura in 
their famous paper |GGKM] . This method was then extended to some other important 
equations towards the end of the sixties. P. Lax |Lax] developed some machinery (that 
is now commonly known as the method of L-A pairs, or Lax pairs) that allows one to 
use the inverse scattering formalism in a more organized way. The first step of the Lax 
method is to express the given evolution equation in the form 

= [A L], (1.14) 

where L is a differential operator (with respect to x), some of whose coefficients depends 
on t, A is a. differential operator with respect to x that does not depend on t and (the 
commutator) [A, L] = AL — LA. In subsequent developments, the evolution equation 
(11.141) was investigated using various analytic methods drawn from the theory of inverse 
spectral and scattering problems and the Riemann-Hilbert problem, among others. In 
their article |GeDij . I.M. Gel'fand and L.A.Dikii (=L. Dickey) observed that fractional 
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(1.10) 
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powers of differential operators can help in a systematic search for pairs L and A whose 
commutator [A, L] is related to a nonlinear evolution equation. The idea of Gel'fand 
and Dikii is to consider the "positive" part {L°')+ of some fractional power L° as such 
an operator A. Let us explain how the fractional powers of the Sturm-Liouville operator 
L of the form fll.81) can be applied to construct the L- A pair for the Korteveg - de 
Vries equation. Since an operator L of the form (11. 8p is of second order, it suffices to 
consider only integer and half-integer powers of L. Integer powers do not lead to anything 
useful: the appropriate A just commutes with L. Half-integer powers are more interesting. 
According to (I1.9p - fll.l0l) . = D. The direct computation of the commutator gives: 

[A, L] = q'l for A = D. The evolution equation fll.l4p is of the form ^ = || in this case. 
The case A = is much more interesting. From fll.lip - fll.12p it follows that 

A = D'' + ^q{x)D + ^q\x)I. (1.15) 

The direct calculation of the commutator of the differential expressions A and L of the 
forms f ll.lSp and (11. Sp . respectively, gives 

[AL] = y'"ix) + ^qix)q'ix). (1.16) 

Thus, the evolution equation f ll.l4p takes the form 

dq ^Id^ 3 dq_ 

dt Adx^ 2^ dx ^ ' ' 

This is the Korteweg - de Vries equation. In the paper [GePij a symplectic structure 
was introduced and a Hamiltonian formalism was developed. The approach of Gel'fand 
and Dikii was further developed by M. Adler [Adlj and by B.M. Lebedev and Yu.I. Manin 
|LebMaj . However, the results of Schur on permutable differential expressions and on frac- 
tional powers of differential expressions are not mentioned either in [AdlJ, or in [LebMaj . 
nor are they mentioned in the well-known surveys [Man] . |Tsuj | , dedicated to algebraic 
aspects of non-linear differential equations. The fact that these results of Schur were 
largely forgotten may be due to the lack of a natural area of application for a long time. 
We found only one modern source where this aspect of Schur's work is mentioned: Tata 
Lectures by D.Mumford. Mumford cites the paper |Schlj in Chapter Ilia, §11 of |Mum] 
(Proposition 11.7). 

The paper |Schlj does not discuss the structure of the set of differential expressions 
which commute with a given operator P. The answer "the differential expressions which 
commute with P are those formal Laurent series in P^/" for which "negative part" vanishes 
is not satisfactory because it just replaces the original question by the question "what 
is the structure of formal Laurent series in P^/" for which "negative part" vanishes. Of 
course, if P is a given differential operator and 6 is a polynomial with constant coefficients 
then the differential operator Q = b{P) commutes with P. More generally, if Z is any 
differential operator and a and b are polynomials with constant coefficients then the 
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operators P = a{Z) and Q = b{Z) commute each with other. However there exist pairs of 
commuting differential operators P, Q which are not representable in the form P = a{Z), 
Q = b{Z). (See formula (1) in |BuChlj .) The problem of describing pairs of commuting 
differential operators was essentially solved by J.L. Burchnall and T.W. Chaundy [BuChlj . 
|BuCh2] . |BuCh3] in the twenties. See also |Bakl] . (The complete answer was obtained 
for those pairs P, Q whose orders are coprime.) The answer was expressed in terms of 
Abelian functions. In particular, it was proved that the commuting pair P, Q satisfy the 
equation 

r(P, g) = 0, (1.18) 

where r(A, fi) is a (non-zero) polynomial of two variables with constant coefficients. (This 
result is known as the Burchnall- Chaundy lemma.) The remarkable papers [BuChlj . 
|BuCh2] , |BuCh3] , |Baklj were forgotten. Their results were rediscovered and further de- 
veloped by I.M. Krichever, |Krilj . |Kri2j . |Kri3j . |Kri4j in the seventies. (When Krichever 
started his investigations in this direction, he was not aware of the results of Burchnall 
and Chaundy. In his paper |Krilj he mentioned only the relevant recent works of a group 
of Moscow mathematicians. However, in his subsequent papers he referred to [BuChlj . 
|BuCh2] , |BuCh3] and [Baklj ; see the "Note in Proof at the end of [Kri2] and references 
[2-4] in jKn3j .) 

Thus, the history of commuting differential expressions, which began with the work of 
Schur jSchlj . is rich in forgotten and rediscovered results. 
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2 . Generalized limits of infinite sequences 
and their matrix transformations. 



One of the basic notions of matliematical analysis is tlie notion of the limit of a sequence 
of real or complex numbers. A sequence {xk}i<k<oo of complex numbers for which the 
limfc^ooa;*; exists is said to be convergent. A sequence {xk}i<k<oo of complex numbers 
for which sup^ \xk\ < oo is said to be bounded. Let the set of all convergent sequences 
be denoted by c, the set of all bounded sequences be denoted by m, and the set of all 
sequences be denoted by s. 

It is clear that each of the sets c, m and s is a vector space, and that c C m C s. 

Sometimes one needs to define a generalized limit R-limk^ooXk (according to some rule 
R) for some sequences for which the "usual" limit limfc_»ooXfe may not exist. Let Cr 
denote the set of all sequences {xk}i<k<oo for which the /?-limfe_>oo a^fc is defined (in other 
words, "exists"). Usually some natural requirements are imposed on such a rule R. Thus, 
for example, it is often required that the set Cr be a vector space. In this case, if the 
condition 

c C Cr and R-\im Xk = lim Xk for all {xfc}i<fe<oo £ c , 

fc— >oo fc— >oo 

is satisfied, then the generalized limit /?-lim is said to be regular. 

A familiar example of a generalized limit is the well known (Cesaro) C-limit: Given a 
sequence {xk}i<k<oo, the sequence {yfc}i<fc<oo is defined as 

Vn = (n = 1, 2, 3 , . . . ) 

n 

By definition, the C-limit of the sequence {xk} exists, if usual limit of the sequence {i/k} 
exists, and 

C - lim Xk — lim yk . 

fc— »oo fc— >oo 

It is not difficult to prove that the C- limit is regular. There are sequences for which the C 
limit exists, but the "usual" limit does not exist. For example, the sequence Xk """"^^'"""^ 
does not tend to a limit as /c — > 00 , but the Cesaro limit exists, and C - lim Xk 

Matrix transformations can be used to define generalized limits. Let A be an infinite 



fc— >oo ^ 
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matrix: 



A 



ail 0-12 

0-21 0-22 



0-2k 



{21] 



where the matrix entries ajk are real or complex numbers. The matrix transformation 
X y = Ax is defined for those sequences x = {xA,.}i<fc<oo for which all the series 
J2 a-nkXk, n = 1, 2, 3, . . . , converge. The resulting sequence y = Ax, y = {yk}i<k<oo 

l<fc<oo 

is defined by yn 



(n 



1, 2, 3, 



). It is clear that the domain of definition 



^y^j (^nk-^ki 
l<fc<oo 

Va of the matrix transformation generated by the matrix A is a vector space, Va C s. 
Moreover, there is a natural generalized limit associated with each such infinite matrix 
A (that we denote as /4-limit and which we shall refer to as the matrix generalized limit 
generated by the matrix A ). Namely, by definition, the A-limit of a sequence x = 

(i.e. the matrix transformation Ax is defined), and the 
y E c. By definition. 



{a;fc}i<fc<oo exists, if x e Va ' 
sequence y = Ax is convergent: 



A - lim Xk 

k^oo 



lim yk 



(2.2) 



The Cesaro generalized limit (C-limit) can be considered as the matrix generalized limit 

generated by the lower triangular matrix A for which a„fc = — for k = 1, 2, . . . , n 

n 

and Unk = for k > n . The systematic investigation of matrix generalized limits 
was initiated by O.Toeplitz, |Toep |. A fundamental contribution to the theory of matrix 
generalized limits was made by Schur. In ^Schl6i| he introduced three classes of matrix 
transformations: convergence preserving, convergence generating and regular. 

A matrix transformation x —>■ Ax is said to be 

1. convergence preserving, if it is defined for every sequence x G c, and for x G c the 
sequence y = Ax belongs to c as well. 

2. convergence generating, if it is defined for every sequence x G m, and for x G m the 
sequence y = Ax belongs to c. 

3. regular, if it is convergence preserving and, moreover, if x G c and y = Ax, then the 



equality lim y^ 



lim Xfc holds. 



Schur obtained necessary and sufficient conditions for a matrix transformation x Ax 
to belong to each of these three classes. These conditions are presented in the following 
three theorems that are taken from |Schl6] . They are formulated in terms of the numbers 

(Tn = ^ ank and Cn = ^ \ank\ (2.3) 

l<fc<oo l<fc<oo 
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for those n, 1 < n < oo, for which these values exist. The values an are said to be the 
row sums; the values (n are said to be the row norms. 

THEOREM I. The matrix transformation A is convergence preserving if and only if the 
following three conditions are satisfied: 

1. For every k the following limit exists 

Qk =^ lim ttnk ■ (2.4) 

n— >oo 

2. The row sums ak tend to the finite limit a: 



a = lim cTfc . (2-5) 

3. The sequence of row norms is bounded: 

sup (n < oo. (2.6) 

l<n<oo 

If these conditions are satisfied, then the series Yl converges absolutely and, if 

l<fc<oo 

a'^= ^ flfc , (2.7) 

l<fc<oo 

then for every convergent sequence {xk} 

lim a^kXk = {a -a) lim + akXk ■ (2.8) 

n— ►oo ' k~^oo ' 

l<fc<oo l<A;<oo 

THEOREM II. The convergence preserving matrix transformation A is regular if and 
only if all the column limits ak defined in l{2.4\) are equal to zero: 

ak = (A; = 1, 2, 3, ...), (2.9) 

and the limit a of the row sums (T„ defined in i\2.5\] is equal to 1: 

a=l. (2.10) 

THEOREM III. The matrix transformation A is convergence generating if and only the 
three assumptions of Theorem I are satisfied and the series Yl Wnk\, n = 1, 2, 3, . . . , 

l<fc<oo 

converge uniformly with respect to n. In this case 

lim y~] ankXk= y~] akXk. (2.11) 

l<A:<oo l<fc<oo 
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Theorem II was formulated and proved by O.Toeplitz in ( |Toep| ). However, Toeplitz 
considered only lower-triangular matrices A . Theorem II is commonly known as the 
Toeplitz theorem or as the Silverman- Toeplitz theorem, since part of Theorem II was 
obtained also by L.L.Silverman in his PhD thesis, |Silvj . Theorem I is known as the 
Schur-Kojima theorem. (Part of Theorem I was also obtained by T.Kojima for lower- 
triangular matrices.) The paper |Koj| by Kojima was published earlier than the paper 
[SchlGj by Schur. However, in a footnote on the last page of [SchlGj . Schur remarks 
that he only became aware of the paper |Koj| while reading the proofs of his own paper. 
The matrices A which correspond to convergent generated transformations are called 
Schur matrices in [Pet]. There is a rich literature dedicated to matrix generalized limits 
and to matrix summation methods. (If a considered sequence is a sequence of partial 
sums of a series, then the terminology "generalized summation method" is used instead 
of "generalized limit" or "generalized limitation method".) We mention only the books 
[Boj . [Cooj . |Harj . [Petj . |Pey| and [Zel] . In all these books, the sections that deal with 
the basic theory of generalized limits and generalized summation methods cite the results 
of Schur and refer to him as one of the founders of this theory. 

In a footnote near the beginning of his paper [SchlGj . Schur notes that his considera- 
tions have many points in common with the considerations of H. Lebesgue and H. Hahn, 
dedicated to the sequence of integral transformations of the form 

b 

yn{r) = / An{r, s) ds . 



He also considers some applications of his Theorems I - 1 1 1 to the multiplication of series 
and to Tauberian theorems. In particular, he derives the Tauberian therem by Tauber 
(about power series) from Theorem II. 

In his other paper |Sch6] , Schur consider Holder and Cesaro limit methods of r-th order 
and proves that these limit methods are equivalent. 

Given a sequence Xi, X2, x^, ... of real or complex numbers, we form the sequences 

, (1) _ X1+X2+ ■■■ +Xn (2) _ h'-^^ +h''2^ + ■■■ + h'-n^ 

'^n ~ 1 '^n ~ ) 

n n 



(3) _ '^1 -r '^2 T -r "-n (r) 
"n ; • • • ; "-n 

n n 

The sequence h\ \, , ... , , • • • is said to be the sequence of Holder means of order 
r (constructed from the initial sequence xi, X2, X3, . . .). Another class of sequences can 
be constructed as follows. Let 

S« = Xi + 0:2 + ■ ■ ■ + X„, sif) = + 4'^ + ■■■ +S^n\ 
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„(3) _ J2) , (2) , . . , „(2) . . . (r) _ (r-1) (r-1) , . . , Jr-1) 



and set 



c 



(r) 



S 



(r) 



The sequence c{\ \ • • • , ci^\ ... is said to be the sequence of Cesaro means of order r 
(constructed from the initial sequence Xi, X2, x^, . . .). The transformations 

{xi, X2, . . . , Xk, . . .} ^ {h'-p, . . . , h'[\ ...} 

and 

/ \ S i^) (r) (r) -1 

\Xi, X2, . . . , Xk, ■ ■ ■ J ^ \Ci , C2 , . . . , ; . . . J 

can be considered as matrix transformations based on appropriately defined matrices 
that we denote by H and C'-'''', respectively. These matrices are lower-triangular. Both 
generalized limits H*^^ -'-limit and C'-'^-'-limit are regular. In [Sch6] it is shown that these 
generalized limits are equivalent in the following sense: 

Let a sequence xi, X2, X3, ... and a natural number r be given. Then the sequence of 
Cesaro means {c^{\ Cg^-*, . . . , c^i^\ . . .} tends to a finite limit if and only if the sequence 
of Holder means {h^i \ \ . . . , h^[\ . . . } tends to a finite limit. Moreover, in this case, 
the two limits must agree. 

Schur obtained this result by showing that both the matrices (H*-'"^)~^ ■ C*-*^-* and (C*-'^'')"^ ■ 
H^''^ satisfy the assumptions of Theorem II (the Toeplitz regularity criterion). Thus, the 
appropriate matrix transformations are regular. 

This result by Schur was not new. At the time that the paper |Sch6] was published 
proofs of the equivalency of Holder's and Cesaro's methods had already been obtained by 
K. Knopp, by W. Schnee and by W.B. Ford. However, these proofs were very computa- 
tional, very involved and not very transparent. 
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3 . Estimates for matrix and integral operators, 
bilinear forms and related inequalities. 

The terms Schur test, Schur (or Hadamard-Schur) multiplication of matrices, Schur (or 
Hadamard-Schur) multipliers are all related to Schur's contributions to the estimates of 
operators and bilinear forms, see |Sch4j . In this section we consider the Schur test. Results 
related to the Schur (or Schur-Hadamard) product will be considered in the next section. 

Let A = \\ajk\\ be a matrix, finite or infinite, with real or complex entries. This matrix 
generates the bilinear form 

y) = ^ ajkXkVj , (3.1) 
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where x and y are vectors with entries {xj} and {yk} that are real or complex. The matrix 
A also generates the linear operator 



X 



Ax, where {Ax)j = ajkXk ■ (3.2) 



If the matrix A is not finite, we consider only finite vectors x and y, i.e., vectors with only 
finitely many nonzero entries. This allows us to avoid troubles related to the convergence 
of infinite sums. If the sets of vectors x and y are provided with norms, then a problem 
of interest is to estimate the bilinear form fl3.ip in terms of the norms of the vectors x 
and y. In particular, the sets of vectors x and y can be provided with I2 norms: 



Ei^^-i'}''"' Ni^. = {Ei^^i'}'''- (3-3) 



If the estimate 



\A{x,y)\<C\\x\\i2\\y\\i2 (3.4) 



holds for every pair of vectors x and y for some constant C < 00, then the bilinear form 
(13. ip is said to be bounded. The smallest constant C, for which the inequality (13. 4p holds, 
is denoted by Ca and is termed the norm of the bilinear form A{x,y): 

Ca = sup „ , . (3.5) 

x^O,y^O \\x\\l2\\y\\l2 

The norm Ca of the bilinear form generated by the matrix A coincides with the norm of 
the linear operator generated by this matrix, considered as a linear operator acting from 
I2 into l2'- 

Xy^O \\x\\i2 

The cases in which it is possible to express the norm Ca in terms of the entries of the 
matrix A are very rare. Thus, the problem of estimating the value of Ca in terms of the 
matrix entries is a very important problem. In particular, if the matrix A is infinite, it 
is important to recognize whether the value Ca is finite or not. Schur made important 
contributions to this circle of problems. 

In |Sch4] (§2, Theorem I), the following estimate was obtained. 

THEOREM (The Schur test). Let A = \\ajk\\ be a matrix, and let 

((A) = sup ^ \ajk\ , k{A) = sup ^ \ajk\ . (3.7) 

j 1 k ■ 

A; 3 

Then 

Ca < VC(^)'«(^) • (3.8) 



14 



It is enough to prove the estimate (13. 8p for finite matrices A (of arbitrary size). The 
proof of the estimate (13.81) that was obtained in |Sch4] is based on the fact that 

Ca = ^Amax, (3.9) 

where Amax is the largest eigenvalue of the matrix B = A*A. Let ^ = {^k} be the 
eigenvector which corresponds to the eigenvalue Amax^ 

Amax^ = B$, . 

Let l^pl = max \^k\- Then, since Amaxl^pl < ( Yl ) \^p\ ? it is easily seen that 
k V / 

Amax — ^ ^ I ^pk I ) 

k 

where the {bjk} are the entries of the matrix B = A* A: bj^ = '^arjark ■ Thus, 

r 

\bpk\ < ^^\arp\\ark\ < ( l^^fl) ■ (max la^fclj < k{A) ■ ({A) . 

k k r r k 

This completes the proof. 
Another proof, which does not use the equality (13.91) . is even shorter: 

^jk\ ' \-^k\ ' \yj\ 

1/2 ,n1/2 



= (X] l^ifclkfcp) ■ [^WjkWyjl"^) (3.10) 

j,k j,k 

< (sup^loj-fcl -^Ixfcp) ^ • (sup^lflj-fcl -^ll/jf 



^ j k k j 



,/K{A)aA)\\xU 



y\\h 



The estimate (13. Sp can be considered as a special case of an interpolation theorem that 
is obtained by introducing the h and loo norms. If x = {xk} is a finite sequence of real or 
complex numbers, then these norms are defined by the usual rules: 



i = ^|xfc| and = sup |xfc| , (3.11) 
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respectively. If A is a matrix, we can consider the linear operator generated by this matrix 
as an operator acting in the space h as well as an operator acting in the space /oo- The 
corresponding norms HAH^^^j^ and ||v4||/^^;^ are defined by the formulas 

Pllii-ii = sup and \\A\\i^^i^ = sup ^^^^ ' 

x^O iFlUi xj^O \\x\\i^ 

Unlike the norm||A||;2^/2, the norms and H^Hioc^/o^ can be expressed explicitly 

in terms of the matrix entries {ajk}'- 

\\A\\i,^i, = k{A) and \\A\\i^^i^ = ((A) , 

where the numbers C(^) and k,{A) are defined in (13. 7p . The estimate (I3.8p takes the form 

WAh^i, < y/\\A\\i^^i,-\\A\\i^^i^. (3.12) 

The inequality f l3.12p is a direct consequence of the M. Riesz' Convexity Theorem. To 
apply this theorem, let denote the norm of the operator, generated by a matrix 

A, considered as an operator from Ip into Iq for l<p<oo,l<g<oo. Then, Riesz' 
theorem states that log is a convex function of the variables a = 1/p and (3 = 1/q 

in the square 0<a;<l,0</5<l. This theorem can be found in |HLPj . Chapter VIII, 
sec. 8.13. G.O. Thorin, |Thoj . found a very beautiful and ingenious proof of this theorem 
using a new method based on Hadamard's Three Circles Theorem from complex analysis. 
Therefore this theorem is also called the Riesz-Thorin Convexity Theorem. Now this 
theorem is presented in many sources, and even in textbooks. The Riesz-Thorin Convexity 
Theorem belongs to a general class of interpolation theorems for linear operators. A 
typical interpolation theorem for linear operators deals with a linear operator that is 
defined by a certain analytic expression, for example by a certain matrix or kernel, but is 
considered not in a fixed space, but in a whole "scale" of spaces. A typical interpolation 
theorem claims that if the linear operator, generated by the given expression, is bounded 
in two spaces of the considered "scale of spaces", then it also is bounded in all the 
"intermediate" spaces. Moreover, the norm of the operator in the "intermediate" spaces 
is estimated through the norms of the operators in the original two spaces. The Riesz- 
Thorin theorem states that the spaces Ip with 1 < p < oo are "intermediate" for the pair 
of spaces li and /oo- 

The estimate fl3.8p can also be considered as a special case of another interpolation 
theorem for linear operators, the so-called interpolation theorem for modular spaces. This 
theorem is based on quite another circle of ideas that are more geometrical in nature and 
was partially inspired by Schur's work ( |Schl8j ). We will discuss this in the next section. 

For practical application, the "weighted" version of the Schur estimate (13.81) is useful. In 
fact, this version was also considered in ( |Sch4] ) ( but not as explicitly, as the "unweighted" 
version). In the weighted version, a positive sequence {r^}, > 0, appears and the 
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"weighted" h- and Zoo-norms 



\^\\h,r = ^\xk\ ■ rk and = sup — — (3.13) 



are considered. 



THEOREM (The weighted Schur test). Let A = [ajk] be a matrix and let be a sequence 
of strictly positive numbers: > 0. Let 



(r{A) = sup — ■ \ajk\ ■ Tk and Kr{A) = sup — \ajk\ ■ rj . (3.14) 
Then the value Ca, defined in (13.51) is subject to the bound 



CA<^/C(A^m). (3.15) 



It is easy to see that 



Cr{A) = sup 



\Ax\ 



h,r 



and K,r{A) = sup 



Thus, the estimate (I3.15P can be presented in the form 



< 



(3.16) 



The inequahty fl3.16p is also an "interpolation" inequality. It shows that the space I2 is 
an "intermediate" space, between the spaces h^r and /oo,r-i- 

The inequality (13.151) can be proved in much the same way as the special case (13. 8p . 

As an example, we consider a Toeplitz matrix, i.e., a matrix A of the form ajk = wj-k- 
The Schur test leads to the estimate 

Ca<J2 I^^I • 
I 



The same bound holds for Hankel matrices, i.e., matrices A of the form ajk = Wj+k 

1 



As a second example, let us consider the Hilbert matrix 



^jk — 
-1 00 



J + k-1 



For 



j,k=i 



this matrix, Yl l^^tkl — ^) "unweighted" Schur test does not work. However, if 
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we chose ri = l^" with a fixed a E (0, 1), then sup I j" ^ j = s{a) < oo. 

l<j<oo Y l<fc<oo 3 + k j 

Thus, ll-ff^ll < s{a). Then we can optimize the estimate by choosing the "best" a. In the 
discrete case the precise value s{a) is unknown. Nevertheless, it is reasonable to choose 
a = 1/2, since this is the optimum value for the continuous analogue of the matrix 

oo 

/I _ TT TT 
y~'^dy = , min = tt is attained at the point a = 1/2 . 
X + y sm Tia ae(o, i) sm na 



(3.17) 

Some other applications of the Schur test can be found in |BiSo] . Chapter 2, Section 10. 
Schur used the estimate (13.81) in ( |Sch4] ) to study the infinite Hilbert forms 

oo oo 

H-^y H+=y ^J^, (3.18) 

p — q p + q — 1 

p,q=l Pi9=l 

and the generalized Hilbert forms 

oo oo 
p,q=l ^ p,q=l 

For the Hermitian matrices 

N = {H+yH^ + {H-yn- = and AT, = (if+)*i/+ + {H-^yH^, 

the conditions 

°° 1 1 TT^ 

l<5<oo g=— oo ^ ' l<q<oo — oo<r<oo ^ ' 

are satisfied for every p. According to (13. 8p . the estimates 

CH+<7r, CH-<7r, C^+<-^, Ch-<-^ (3.19) 

^ smTTA ^ smTTA 

hold. It turns out that in fact equality prevails in the first two inequalities in (I3.19p . i.e., 
the method of Schur gives the exact values for the norms of the matrices if^, H~ . (See 
|HLP] . Chapter IX). It should be remarked that an essential part of Chapters VIII and 
IX of |HLP] is based on results of the paper |Sch4j . 

In § 6 of |Sch4] . the infinite quadratic form 

^(j^^ffiMp^ (3.20) 

p,q=l 



is considered, where t, — vr < t < vr, is a parameter. It is shown that the form F{t) is 
bounded and that 



oo 



tJ2xl<F{t)<{n-t)J24- (3-21) 

p=i p=i 



It is also shown that the quadratic form 

sin {p — q)t 



E 

p,q=l 



p-q 



is unbounded for t ^ 0. The form f l3.20p is interesting because it is an example of a 
symmetric infinite matrix [apq\ that corresponds to a bounded bilinear form, whereas the 
form related to the matrix [ \ apq \ ] is unbounded. The Hilbert matrix [h~g], also generates 
a bounded bilinear form H~ (see f l3.18l) ) and the matrix [j/'-pql] also corresponds to an 
unbounded form. However, the Hilbert matrix is antisymmetric. 
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4. The Schur product and Schur multipliers. 



Let A and B be matrices of the same size whose entries are either real or complex 
numbers (or even belong to some ring yV): A = [ctpg], B = [bpg]. The Schur product Ao B 
of the matrices A and B is the matrix C = [cpq] (of the same size as A and B) for which 

Cpq Oipq • bpq. 

The term Schur product is used because the product AoB was introduced in ( [Sch4] ) for 
matrices, and some basic results about this product were obtained by Schur in that paper. 
The most basic of these results states that the cone of positive semidefinite matrices is 
closed under the Schur product. We recall that a square matrix M = [rripq] (with complex 

entries) is said to be positive semidehnite if the inequality ^ nipqXqX^ > holds for every 

p,q 

sequence {xk} of complex numbers. (In the case of an infinite matrix M , only sequences 
{xk} with finitely many Xk different from zero are considered.) 

THEOREM (The Schur product theorem, Theorem VII, |Sch4j ). If A and B are positive 
semidefinite matrices (of the same size), then their Schur product A o B is a positive 
semidefinite matrix as well. 

For self-evident reasons, the Schur product is sometimes called the entrywise product 
or the elementwise product. It is also often referred to as the Hadamard product. The 
term Hadamard product seems to have appeared in print for the first time in the 1948 
(first) edition of [Hall] . This may be due to the well known paper of Hadamard |Hadj . in 
which he studied two Maclaurin series f{z) = J^n^nZ"' and g{z) = ^„&n-2" with positive 
radii of convergence and their composition h{z) = anbnZ^, which he defined as the 
coefficientwise product. Hadamard showed that h{-) can be obtained from /(■) and g{-) 
by an integral convolution. He proved that any singularity Zi of h{-) must be of the form 
zi = Z2Z3, where Z2 is a singularity of /(■) and Z3 is a singularity of g{-). (This result is 
commonly known as the Hadamard composition theorem.) Even though Hadamard did 
not study entrywise products of matrices in this paper, the enduring influence of the cited 
result as well as his mathematical eminence seems to have linked his name firmly with 
term-by-term products of all kinds, at least for analysts. (Presentations of the Hadamard 
composition theorem can be found, for example, in |Biej . Theorem 1.4.1, and in |Titj . 
Section 4.6. 

PROOF of the Schur product theorem. It is enough to prove this theorem for matrices 
of arbitrary finite size. First we prove the theorem for matrices A and B of rank one. 
In this case the matrices A and B must be of the form A = a ■ a*, B = b ■ b*, where 
a and b are column vectors. It is evident that the matrix C = A o B is of the form 
C = c ■ c* where the column vector c is just the Schur product of the column vectors a 
and b: c = a o b. Hence, the matrix C is positive semidefinite. In the general case, we 
use the spectral decomposition theorem. This theorem states that every finite positive 



20 



semidefinite matrix M admits a decomposition of the form M = ^ M{X), where the 

AG(t(M) 

summation index A runs over the spectrum cr(M) of the matrix M, and the matrices 
M(A) are either positive semidefinite matrices of rank one or zero matrices. Decomposing 
the given matrices A and B in this way: A = ^ B = ^ B{fi), we see that 

X£cr{A) tiecr{B) 

AoB= J2 ^(A)o5(/i) is a sum of positive semidefinite matrices: The Schur product 

A{X) o B{fi) of positive definite matrices of rank one is a positive semidefinite matrix, 
whereas, if at least one of the matrices A{X) or B{fi) is equal to zero, then their Schur 
product is equal to zero. Thus, the theorem is proved. □ 

Every matrix H, finite or infinite, generates a linear operator %h acting in the space of 
all matrices of the same size as H: 

%H-A^HoA, or %hA = HoA. 

The linear operator %h is said to be the Schur transformator generated by the matrix 
H. (The term transformator is borrowed from [GoKrj . who used it to designate a linear 
operator that acts in a space of matrices (operators).) If the Schur transformator %h is a 
bounded operator in a space of infinite matrices, equipped with a norm, then the matrix 
H is said to be the Schur multipher (with respect to this norm). 

The first basic estimate of the norm of the transformator %h was obtained by Schur in 
jSdIi] : 

THEOREM (The Schur estimate for positive definite Schur transformators) . Let H = 
[hpq] be a positive semidefinite matrix for which 

Dh == sup hpp < oo . (4.1) 

p 

Then 

\\H o A\\i2^i2 < Dh\\A\\i2^i2 . (4.2) 

(Here, as before, ||y4||p_^/2 is the operator norm of the matrix A considered in the ap- 
propriate space of sequences). 

PROOF of the estimate (14. 2p . We reproduce here the reasoning of Schur from |Sch4j . It 
suffices to consider only finite matrices. The proof is based essentially on the fact that a 
positive semidefinite matrix H admits a factorization of the form 

H = LL\ (4.3) 

where L = [Ipq], i.e., 

hpq = ^ Iprlqr (Vp, q) . (4.4) 
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Therefore, the number J2 (^pghpgUqXp can be rewritten as 



p,q 



^ ^ ^pqhpqUqXp ^ ^ Q-pq (y ^ ^ ^pr^qr^ VqXp ^ ^ ^ ^ ^pqiJ-prXp) {J-qrUq) 



p,<i 



r p,q 



Thus, 



Yj (^pqhpg^pVq ^ Y(^pgi^prXp)ilqryq) < Y ll^ll ( \hrXk\ ) [Y ^fcrZ/fc | 

p,q r p,q r ^ k ^ ^ k 



\\A\\Y{Y\hrXk\y(Y\hryk\') < \\A\\{YY\hrXk\Y\YY\ikryk\') 

r k ^ k ' r k r k 

<\\A\\iY{Y\i kr\ ) I Xk 

\) (Y{Y\ 4r| )\yk 

^ k r ' ^ k r 

/ I 1 2 I 1 2\ -'-/^ / I 1 2 

< ||A|| ( X] (maxX; |4r| ] ( XI ( max |4r| ) 

' ^^^Y\hr?){YW?y'\Y\yk?Y'" ■ 

' r ' k k 

According to (14.41) . Y l^fcrp = hkk- Thus, max ( ^ |/fcrP) = T^SiXhkk = Dh- Finally. 



2n 1/2 



J2apqhp,x-py, < \\A\\.Dj,.{J2\xk\Y\J2\y>^\") 



(4.5) 



where {xfe} and are arbitrary sequences. This is the estimate (14. 2p . 
In fact, the reasoning of Schur allows us to prove a slightly more general result: 



□ 



THEOREM (The Schur factorization estimate for Schur transformators) . Let H = [hpg] 
be a matrix which admits a factorization of the form 



H = L- M* , i.e., hpq = ^ Iprm^r (Vp, g), 

r 

where the matrices L = [Ipr] and M = [m^g] satisfy the conditions 

Dl =^ sup \lpr\^ < oo and Dm *= sup I^Tig^l^ < oo . 

Then for every matrix A (of the same size as H) the following inequality holds: 



(4.6) 



(4.7) 



(4. 
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REMARK. The matrices L, M and H need not be square. The only restriction is that 
the matrix multiphcation L, M — > L ■ M* is feasible. In fact, the set over which the 
summation index r runs in (14. 6 p need not be a subset of the set of integers. It can be of 
a much more general nature. Thus, for example, let X be a measurable space carrying 
a sigma-Gnite non-negative measure dx. Let {lp{x)} and {m,q{x)} be sequences of X- 
measurable functions defined on X and satisfying the conditions Dl < oo, Dm < oo, 
where now 

Dl = snp / |/fc(x)p(ia; and Dm = sup / \mk{x)\'^ dx . (4.9) 

k J k J 



Let H be a matrix with entries 



pq 



lp{x)mq{x) dx (Vp, q) 



(4.10) 



(i.e., the matrix H admits a factorization of the form H = L ■ M* , where L and M are 
operators acting from the Hilbert space L'^ {X, dx) into appropriate spaces of l°° sequences). 
Then the inequahty ( 14. 8|) holds for an arbitrary matrix A (of the appropriate size), where 
now Dl and Dm are defined in 



The last result (with X = (a,6), a finite or infinite subinterval of R, and Lebesgue 
measure dx on {a,b)) appears as Theorem VI in |Sch4] . 



The matrix 



H 



1 



-Xp + /iqJ l<p,g<oo 

where and fik are sequences of positive numbers that are separated from zero: inf^ > 
0, inffc /ifc > 0, serves as an example. Here, 



oo 



1 < p, q < OO . 



and for this H the inequality (14. 2 p 

a. 



pq 



Xp + Hq 



< Dh\\ [apg] 



holds with 



D 



1 



1 



H 



2 y/inik Xk Vinffc /ifc 
(This example is adopted from |Sch4j : it appears at the end of §4.) 

It is remarkable that the existence of a factorization of the form H = L ■ M* for the 
matrix H is not only sufficient but is also a necessary condition for the operator A—*HoA 
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to be a bounded operator in the space of all matrices A (equipped with the operator norm 
in /^). This converse result was proved by G.Bennett in [Ben]. 

THEOREM (The inversion of the Schur factorization estimate). Let a given matrix 
H = [hpq] (finite or infinite) satisfy the inequality 

\\H O A\\l2^l2 < D\\A\\l2^l2 (4.11) 

for all matrices A of the same size as H, for some finite constant D that does not depend on 
A. Then for every e > 0, the matrix H can be factored in the form H = L- M* , where the 
matrices L = [Ipr] and M = [m^g] act from to l°° and satisfy the inequality \/Dl ■ Dm < 
D + e, and the values Dl and Dm are defined in ( 14. 71) . i.e., Dl = \\L\\i2^ioc and Dm = 

||M||p^;oo. 

This theorem appears as Theorem 6.4 in [Ben]. It shows that the Schur factorization 
gives a result which is in some sense optimal. The proof of this theorem of G. Bennett 
is essentially based on results obtained by A. Pietsch on absolute summing operators in 
Banach spaces, see |Piel] and |Pie2] (which, in turn, are based on fundamental results of 
A. Grothendieck, see the references in |Piel] and |Pie2] ) . 

In |Sch4] ■ Schur considers a new class of functions of matrices, namely, the so called 
Schur (or Schur-Hadamard) functions of matrices. Let A = [apq] be an infinite matrix 
whose entries have a common finite bound: \apq\ < R {yp,q), where R < oo. Let /(■) 
be a function that is defined in the closed disk {z : \z\ < R}. The matrix f°{A) is defined 
"entrywise" as follows: 

riA) [fiapq)] . 

oo oo 

The following result is proved in |Sch4] : Let f{z) = ^ Ckz'' , where J2 l^fc \R'' < oo and 

k=l k=l 

let the operator generated by the matrix A be bounded, i.e., \\A\\i2^i2 < oo. Then the 
operator generated by the matrix f°{A) is also bounded: ||/°(^) ||;2^;2 < oo. 
This result appears as Theorem IV in |Sch4] . 

The concept of the Schur (Schur-Hadamard) product arises in several different areas of 
analysis (complex function theory, Banach spaces, operator theory, multivariate analysis); 
see the references in the introduction to [Benj . The paper |Sty| contains some applications 
of the Schur product to multivariate analysis as well as a rich bibliography of books and 
articles related to Schur-Hadamard products. The paper [HorRlj contains a lot of facts 
about Schur-Hadamard products and Schur-Hadamard functions of matrices as well as 
a rich bibliography. In particular, it discusses fractional Schur-Hadamard powers of a 
positive matrix, infinite Schur-Hadamard divisibility of a positive matrix and its relation 
to the conditional positivity of the logarithmic" matrix. Chapter 5 of the book |HorR2] 
(about eighty pages) is dedicated to the Schur-Hadamard product of matrices. 

A very fruitful generalization of the Schur transformator is the the Stieltjes double- 
integral operator. This notion seems to have appeared first in the papers of Yu.L. Daletskii 
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and S.G.Krein |DaKrlj . |DaKr2j . [Dal] , [Da2] . Later on, the theory of double-integral 
operators was elaborated on in great detail by M.S. Birman and M.Z. Solomyak in [BiSol] 



Let A and M be measurable spaces, i.e., sets provided with sigma-algebras of subsets, 
and let E{dX) and F{dfj,) be two orthogonal measures in a separable Hilbert space that 
are defined on A and M, respectively, i.e., weakly-countably-additive functions taking their 
values in the set of orthogonal projectors in 9) and satisfying the condition E{a)E{P) = 
if a n /5 = and F{'y)F{6) = if 7 fl 5 = 0. We assume also that the orthogonal 
measures E{dX) and F{d^) are spectral measures, i.e., they also satisfy the conditions 
E[A) = I and F{M) = I, where / is the identity operator in Sj. If A is a bounded linear 
operator in S^, then 



where the integral can be understood in any reasonable sense. The equality f l4.12p can 
be considered as a direct generalization of the matrix representation of an operator in a 
Hilbert space with respect to two orthonormal bases. Namely, let the orthogonal spectral 
measures E{dX) and F{dn) be discrete and let their "atoms" be one-dimensional orthog- 
onal projectors, i.e., the atom of the measure E{dX), located at the point A G A, is of the 
form E{{X}) = ( ■ , ex)ex and the atom of the measure F{dfi), located at the point /i G M, 
is of the form F({yu}) = ( ■ , /^)/^, where ex and are normalized vectors generating the 
one-dimensional subspaces E{{X})Sj and respectively. The collection of all the 

vectors {ex} corresponding to all the atoms of the measure E[dX) forms an orthonormal 
basis of the space S^. Analogously, the collection of all the vectors {/^} corresponding 
to all the atoms of the measure F{dfi) also forms an orthonormal basis of the space Sj. 
Consequently, the representation fl4.12p of the operator A takes the form 



where a^^x = {Acx , ffj.) ■ Thus, in the case of discrete orthogonal spectral measures with 
one-dimensional atoms, the representation fl4.12p turns into the matrix representation of 
a given operator with respect to given orthonormal bases. The matrix [a^^x] corresponds 
to the operator A. If h{fi, X) is a measurable function defined on M x A, then the sum 



can be pictured as an application of the Schur transformator corresponding to the matrix 
[hij.,x] to the operator A: A T^hA. The sum on the right hand side of the equality fl4.14p 
can be formally written as an integral: 




(4.12) 



MxA 




(4.13) 




(4.14) 




(4.15) 



MxA 



25 



However, one can consider integrals of the form fl4.15p for arbitrary orthogonal spectral 
measures E{d\) on A and F{dfi) on M, and more or less arbitrary functions h{fi, A) on 
M X A. If the integral (14.151) exists in a reasonable sense (either as a Lebesgue integral, 
or a Riemann-Stieltjes integral, or some other integral), it is said to be a Stieltes double- 
integral operator. The problem of establishing the existence of a Stieltes double-integral 
operator is intimately associated with estimates for it in various norms. In particular, the 
estimates 

\\'^hA\\^_^^<C\\A\\^_^^ (4.16) 

and 

Me.^e.<C\\A\U^e. (4.17) 

are extremely important. Here ||$||<h^<r is the "uniform" norm of the operator $, acting 

11*^'^ I If) 

in Sj: ||$||<R^<R = supj,g^^^o "iTli — > ||$||6i^Si is its "trace" norm. 

In |BiSo4] the estimate KW\ was obtained for functions h{- , ■ ) which admit a "factor- 
ization" of the form 

h{n, X) = m{iJ,, x) ■ 1{X, x)dx , (4-18) 



X 

where X is a measurable space carrying a non-negative sigma-finite measure dx, 

Cm = ess sup / \m{fi,x)\'^dx , (7; = ess sup / |/(A, a;) (4.19) 

/i6M J AgA J 

X X 

and 

C = ^/C~Cl<oo. (4.20) 

The inequality (14.161) is then obtained (with the same constant C) by invoking the du- 
ality between the set of all bounded operators in and the set ©i of all trace class 
operators. The estimate (I4.17P holds with the same constant C (that is given in fl4.20p ). 
Unfortunately, the paper |BiSo4j is not translated into English, but some results of this 
paper, in particular, the estimate (14.160 . (I4.20p . are reproduced in [ABF] . Section 2. 

The estimate (I4.16P is a direct analog of the Schur factorization estimate (14. Sp . (14. 7p and 
is obtained by the same method that Schur used. However, when Birman and Solomyak 
started to develop the theory of Stieltjes double-integral operators, they were not aware 
of the paper |Sch4j by Schur. The close relationship between double-integral operators 
and the results of Schur was only discovered later. In Section 2 of |Pelj . V. Peller obtained 
a result that "inverts" the estimate (I4.16P by Birman and Solomyak in the same sense 
that Theorem 6.4 of |Benj (that was stated earlier) inverts the factorization estimate by 
Schur. Peller proved an even stronger result, a "maximal" version of the inverse result. 
Namely, he proved that if the function h is such that the estimate (I4.16P holds for every 
bounded operator A in Sj with a finite constant C that is independent of A, then the 
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function h{-, ■) admits a factorization of the form fl4.18p . where the functions m( -, ■) 
and ( ■, ■ ) satisfy the conditions 

esssup |m(yU, x) I ) dx < oo and / ( esssup |/(A, x) I ) dx < oo. (4-21) 
AteM ^ J ^ AeA ^ 

X X 

The estimate (14.161) . (I4.20p is "semi-effective": given the function h{fi, A), it is not so easy 
to see when it admits a factoraization of the form (14.181) . To overcome this difficulty, 
Birman and Solomyak developed another approach that reduces the study of Stieltes 
double-integral operators to the study of integral operators of the form 

m(A) ^ v{fi) = [ h{fi, X)u{\)p{dX) . (4.22) 

J A 

This reduction is explained in |BiSo2j . Theorem 2, and also in [BiSo4j . Lemma 1.1. The 
operator (14.221) acts from the space L'^{A,dp{\)) into the space L'^(M,da{fi)), where 
p{dX) = {E{dX)uj,uj), a{dp) = {F{dfi)6,6) and 00,6 G . The estimates for the inte- 
gral operators (14.221) must be carried out for all vectors uj,6 & and must be uniform 
with respect to the measures p{dX) and a{dp). To obtain such estimates, Birman and 
Solomyak developed a method that is based on the approximation of functions from 
the Sobolev-Slobodetskii classes by piecewise-polynomial functions, |BiSo5j . |BiSo6j . 
|BiSo7] ■ §§8-9, |BiSo8j . Chapter 3, §§5 - 7. In the construction of the approximating 
functions, a partition of the domain of definition of the approximated function appears. 
To achieve the desired uniformity of the approximation with respect to the measures p{dX) 
and a{dp), this partition must be adapted to these measures. 

The approach, based on piecewise-polynomial approximations, allows one to approxi- 
mate the kernels of the integral operators (14.221) by finite-dimensional kernels, and thus to 
obtain the needed estimates for the singular values of the Stieltjes double-integral opera- 
tors. The estimates of the double-integral operators are made not only in the uniform and 
trace norms, but also in many other norms. These estimates depend upon the smoothness 
of the function h{- , ■ ) (assuming that A and M are smooth manifolds). 

Double-integral operators appear in the formula for differentiating functions of Hermi- 
tian operators with respect to a parameter. Namely, let r — > if(r) be a function on some 
open subinterval of the real axis R whose values are self-adjoint operators in a Hilbert 
space Let / : R — > M be a real- valued function that is defined and bounded on R and let 
E{dX,T) be the spectral measure of the operator H{t). Under appropriate assumptions, 
Yu.L. Daletskii and S.G. Krein, [DaKrlj . obtained the formula 

SnHir))_ frfiX)-fip)^,,^^^^8mE(,X,r). (4.23) 



dr J J ^ — fJ^ dr 

RxR 

d f ( H ( 1 

This formula, which expresses the derivative as a Stieltjes double-integral 

or 

operator, seems to be the first recorded application of Stieltjes double-integral operators. 



27 



The paper [Dal] contains a version of Taylor's formula for operator functions. The paper 
|DaKr2j (and, to some extent, the paper |Da2] ) contains a more detailed presentation of 
the results of the papers [DaKrl] and [Dal] as well as some extensions. Later on, Stielt- 
jes double-integral operators were widely used in scattering theory. M.Sh. Birman, [Bilj . 
used them to prove the existence of wave operators. ( See also |BiSo2j , especially the last 
paragraph of this paper.) Double- integral operators are involved in the study of the so 
called spectral shift function (see jBiSolOj and ^BiYaj ). The paper [BiSoll] is devoted to 
the application of double-integral operators to the estimation of perturbations and com- 
mutators of functions of self-adjoint operators. It is worth noticing that double-integral 
operators allow one to make an abstract and symmetric definition of a pseudodifferential 
operator with prescribed symbol (see item 3 of the paper [BiSoQj ). 

Thus, the ideas of Issai Schur on the termwise multiplication of matrices, partially 
forgotten and rediscovered, are seen to lead very far from the original setting. 
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5 . The Schur Convexity Theorem. 



The well known Hadamard inequality states that 

detH< Y[ hkk (5.1) 

l<fc<n 

for every non-negative definite Hermitian matrix H = [hjk]i<j,k<n- (There are many 
proofs; see, for example, [Ho Jo] . Section 7.8.) In a short but penetrating paper published 
in 1923, Issai Schur |Schl8] gave a highly effective method for deriving this inequality. 
However the importance of the paper |Schl8] rests primarily on the ideas which are con- 
tained there and by the impact which the paper had on various areas of mathematics, some 
of which lie very far from the original setting. This paper has generated and continues to 
generate many fruitful investigations. 
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Given a Hermitian matrix H = [hjk]i<j,k<n, it can be reduced to the diagonal form 

i7 = f/diag(cui, ... ,cu„)f/*, (5.2) 

where ui, ... ,Un are the eigenvalues of the matrix H, and U = [ujk]i<j,k<n is a unitary 
matrix. (If the Hermitian matrix H is real, then the matrix U can be chosen real also, 
i.e., if H is real and symmetric, then U is orthogonal.) In particular, the equality (15.21) 
implies that 

hn 1 \ l^iiP • • • kinP ] [ ^^1 

(5.3) 



hn 




\un? ■■ 








hnn 




_ \Unl? ■ ■ 
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Since the matrix f/ in (15.21) is unitary (orthogonal), the matrix M = [m 



j'fcj l<j,fc<n) 



with 



(5.4) 



as in (15. 3p . possesses the properties 



i. rriik > 0, 



ii. ^jk = 1, 

l<A:<n 

iii. ^jk = 1, 



l<j,k<n; 
1 < j < n; 

1< k<n. 



(5.5) 



It turns out to be fruitful to consider linear transformations whose matrices M satisfy 
the conditions (15.51) . without regard to the relations (15.41) . 

DEFINITION 1. A matrix M = [iTLjk]i<j^k<n is said to be doubly-stochastic iftlie conditions 
fl33|) are fulfilled. 

DEFINITION 2 A matrix M = \ 

'm'jk\i<j,k<n Is said to be ortho-stochastic if tliere exists an 
orthogonal matrix U = [ujk]i<j.k<n such that the matrix entries mjk are representable in 
the form l\5.4\) . i.e., if M is the Schur product of an orthogonal matrix U with itself. 

REMARK 1. It is clear that every ortho-stochastic matrix is a doubly-stochastic. How- 
ever, not every doubly-stochastic matrix is an ortho-stochastic. For exampie0, the matrix 
" 3 3 " 

is doubly-stochastic, but not ortho-stochastic. 



P = - 
6 



3 1 2 
3 2 1 



Many well known elementary inequalities can be put in the form 

$(^, ... ,x) < $(xi, . . . ,Xn), 



(5.6) 



^This example is adopted from |Schl8j . 
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where x = (xi + • • • + Xn)/n and xi, ... x„ lie in a specified set. For example, the 
inequahty 

(p(x) < {ip{xi) + ■ ■ ■ + ip{xn))/n (5.7) 
for a convex function ip of one variable can be written in the form (15.61) . with $(^i, . . . , C,n) = 

We recall, that a real valued function (p, defined on a subinterval {a, P) of the real 
axis, is said to be convex if ip is continuous there and the inequality (p{{xi + X2)/2j < 
(pixi) + p{x2)) /2 holds for every xi, xi G (a, /3). The inequality (15. 7p is a special case of 
the so-called 

JENSEN INEQUALITY. Let p be a convex function on an interval (a, (3), let xi, ... ,Xn 
be points in the interval {a, /3), and let the numbers Ai, . . . , A„ satisfy the conditions 

i. Ajt > 0, 1 < k < n; 

ii. E A. = l. (5.8) 

l<fc<n 

Then 

(p{XlXi + ■ ■ ■ + XnXn) < Xiip{Xi) + ■ ■ ■ + Xn<-P{Xn) ■ (5.9) 



The value x = [xi + . . . + a;„)/n that appears in (15. 7p . the so called arithmetic mean 
of the values xi, ... ,Xn, is the most commonly used average value for xi, ... ,Xn- The 
value XiXi + . . . + A„x„ that appears in (15. 9p . the so-called weighted arithmetic mean, is 
a more general average value for xi, ... , a;„ . 

In |Schl8j . doubly-stochastic matrices M = [mjk]i<j^k<n are used to construct an average 
sequence yi, ... , ?/„ from a given sequence of real or complex numbers xi, ... , by the 
averaging rule 



yi 




mil • 






Xi 


Vn 




m„i . 









(5.10) 



It is intuitively clear that the sequence of "averaged" values {uk} is "less spread out" than 
the original sequence {xk}- In |Schl8] . inequalities of the form 



1) 



I 2^n) ; 



(5.11) 



are considered for points (xi, ... , x„) and (?/i, ... , ?/„) in the domain of definition of the 
function $ that are related by a doubly stochastic matrix M = [iTijk]i<j^k<n by means 
of the averaging procedure y=Mx given in (I5.10p . In particular, the inequality (15. lip is 
estabhshed there for functions $ of the form $(^i, . . . , ^n) = V^(^i) + ■ ■ ■ + '^{C.n)'- 

THEOREM I. Let p be a convex function defined on a subinterval (a, 13) of the real axis, let 
Xi, ... ,Xn be arbitrary numbers from (a, /?), let M = [mj^] be a doubly stochastic 
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matrix and let the numbers yi, 
Then 



, Hn be obtained from the averaging procedure y = Mx. 

+ '^{Vn) < ^P{xi) + ■ ■ ■ + ip{Xn) . (5.12) 



PROOF of Theorem I. In view of the conditions fl5.5[ i) and fl5.5[ ii). Jensen's inequahty is 
apphcable with = mj^, k = 1, ... ,n, and imphes that 

mji(f{xi) + ■ ■ ■ + nijn^pixn) < ip{mjiXi + ■ ■ ■ + mj^Xn) = viVj)- 

The desired conclusion is now obtained by summing the last inequality over j from 
1, ... ,77, and invoking the condition fl5.5[ iii). □ 

The preceding theorem appears as Theorem V in }Schl8] and is used there to derive the 
(Hadamard) inequality 

]^ < ]^ hkk 

l<k<n l<k<n 

for a positive definite Hermitian matrix H = [hjk]^^^ f^^^ with eigenvalues ui, . . . ,ujn- 
The latter is equivalent to the inequality 

(-log hkk) < (^-^^^ 

l<A,<n l<k<n 

which is of the form ( 15.12p . with the convex function <^(^) = — log^. In this case, the 
averaging doubly-stochastic matrix M = [mjk] is the ortho-stochastic one, with 

entries mjk of the form (15. 4p . as in fl5.3p . 

In |Schl8] , functions $ of several variables for which inequalities of the form (15. lip hold 
are also considered. 

DEFINITION 3. A function ^ of n variables Xi, ... ,Xn. is said to be S-convex (i.e., 
convex in the sense of Schur) if for every doubly-stochastic matrix M and every pair 
of points X = (xi, . . . ,Xn) and y = Mx in the domain of $, the inequality (15.111) 
holds. The function $ is said to be S-concave if the opposite inequality holds, i.e., if 
$(xi, . . . ,Xn) < $(yi, • • • ,yn), holds for every pair of points x and y = Mx in the 
domain of $. A function $ is S-concave if and only if the function — $ is S-convex. 

Let TT be a permutation of the set {1, ... ,n}. Then the corresponding operator on 
that permutes coordinates according to the rule (xi, . . . ,x„) — >• (x,r(i), • • • ,XTr{n)) is 
linear. Its matrix with respect to the standard basis in M" is termed a permutation 
matrix and is of the form 

P. = [{P7r)jk] i<^. fc<„ , where, for = 1, . . . , n , {p^)jk = | j[ ^ ^jjij' (5.14) 
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There are n! permutation matrices of size n x n. Every permutation matrix is a doubly- 
stochastic one. The inverse of a permutation matrix is a permutation matrix as well, and 
hence it is also doubly-stochastic. Therefore, 

Every S-convex function $ of ri variables is a symmetric function: 

^(xi, . . . ,Xn) = . . . for every permutation vr. (5.15) 



THEOREM II. Let $ be a S-convex function of n variables, n > 2, and let all its partial 
derivatives of tie first order exist and be continuous. Then the function $ satisfies the 
condition 

-— (xi,a;2, . . . - -— (xi,a;2, . . . > 0, if a;i > Xa . (5.16) 

ox I 0x2 

This theorem provides a necessary condition for a symmetric function $ be S'-convex. It 
appears as Theorem I in |Schl8j . Theorem II in |Schl8] also contains a sufficient condition 
for a symmetric function $ be S'-convex. 

THEOREM III. Let $ be a symmetric function of n variables, n > 2, that satisfies the 
condition 

-2- — - — (a;i,X2, . . . ,a;„) > for all xi,X2, ■ ■ ■ ,Xn ■ (5.17) 



dxf dxl dxidx2 
Then the function $ is S-convex. 

However, A. Ostrowski showed that condition (15.161) is both necessary and sufficient for 
a symmetric function $ to be S'-convex; see Theorem VIII in |Ostr] . The reasoning in 
[Ostrj is based essentially on the the reasoning in |Schl8] . but is more precise. 

In |Schl8] it is shown that the elementary symmetric functions Ck{xi, . . . ,x„), k = 

1, ... ,n, are S-concave, and that the functions ^k{xi, ... , Xn) = -^^-7 — ' — 7- , k = 

1, ... , n — 1 , are S-concave. 

To this point, we have reviewed almost all the main results of the short paper |Schl8j . 
The significance of this paper is not confined to these results, important as they are, 
but rests primarily on the fact that linear transformations with doubly-stochastic ma- 
trices were introduced there. This paper attracted the attention of mathematicians to 
doubly-stochastic matrices. (In [BeBej the term "Schur transformation" is used for linear 
transformations with such matrices; see |BeBe] . Chapter I, § 29.) Schur himself did not 
use the term doubly-stochastic matrix. He just referred to "a matrix M that satisfies the 
conditions (15. 5p ." The term "doubly-stochastic matrix" seems to have appeared first in 
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the first edition of the book |Felj by W. Feller, in 19500. 

Many results were infiuenced by the paper [SchlSj . We shall begin with the theorems 
of Hardy-Littlewood-Polya and Birkhoff. 

To formulate the Hardy-Littlewood-Polya Theorem, we have to introduce the notion of 
majorization. Let ,^1, ^2, • • • be a sequence of real numbers. By ,^2, • • • , C "^^ 
denote the reaarangement of this sequence in non-increasing order: 

^1 > 'C2 — • • • — ^fc = ^TT(k) for some permutation vr of the set of indices 1,2 . . . ,n . 

DEFINITION 4. Let x = (xi, X2, ... ,a;„) and y = {yi, 1/2, ... ,yn) be two sequences of 
real numbers. Then we say that the sequence y is majorized by the sequence x (or that 
the sequence x majorizes the sequence y) if the following conditions are satisfied: 

yl + y*2+ ... +yl < xl + x*^ + . . . + xl (A; = 1, 2, . . . , n - 1 ) ; 

yl + y^+ ... + y*_^ + y* = xl + x^ + . . . + + x*^ . 

(5.18) 

A relation of the form (15.181) is said to be a majorization relation and is denoted by the 
symbol 

y^x, or (?/i, ?/2, ••• ,!/„)-< (xi, X2, ... ,x„), (5.19) 

The relations (15.181) were considered by R.F.Muirhed [Muirj and by M.O.Lorenz [Lor] 
in the beginning of 20th century. Muirhead introduced these relations (with integer Xk, yu 
only) to study inequalities for homogeneous symmetric functions (Muirhead's result is also 
presented in |HLP] . Chapter II, sec. 2.18). Lorenz used the relations (15.181) to describe 
the non-uniformity of the distribution of wealth in a population. However, the nota- 
tion (I5.19P and the term "majorization" were introduced by G.H. Hardy, J.W. Littlewood 
and G.Polya in 1934; see jHLP] . Sec. 2.18. Chapter II of the book jHLP] . in which 
majorization is introduced and discussed, contains a number of references to private com- 
munications by Schur. 

THEOREM (G.H.Hardy, J.W. Littlewood and G.Polya, jHLP] . sec. 2.20) 

I. Let X = {xi, ... , Xn) and y = {yi, . . . , yn) be two sequences of real numbers and let 
matrix M be a doubly-stochastic matrix such that x = My. Then y -< x. 

II. Let a; = (xi, . . . , x„), and y = {yi, . . . , ?/„) be two sequences of real numbers such 
that y -< X. Then there exists a doubly stochastic matrix M such that x = My. (In 
general such a matrix M is not unique.) 

^However, the term "stochastic matrix" was used as early as 1931 in [Roml] (see also |Rom2j ] for 
matrices satisfying the conditions (|5.5l i) and (|5.5l ii) only (but not necessarily the condition ()5.5l iii)). 
Such matrices play a crucial role in the theory of Markov chains. 
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Part II of this theorem and the first cited theorem of Schur (which appears as Theorem 
I in this section) implies the following result: 



THEOREM I'. Let a sequence y = {yi, . . . ,yn) be majorized by a sequence x = (xi, . . . ,x„), 
let Xk,yk G («, /5) C M for k = 1, . . . ,n), and let ip be a convex function on the interval 
Then the inequality fl5.12p holds. 

It turns out that the converse statement is true ( |HLPlj . Theorem 8; |HLPj . Theorem 



Let Xk, yk € for k = 1, . . . ,n and ssume that the inequality (15.121) holds for 

every function (f which is convex on the interval (a,/5). Then x = My for some doubly- 
stochastic matrix M. 

This means that Schur's result (which appears as Theorem I in this section) is sharp in 
some sense. 

In |GoKrlj . Chapt. II, Lemma 3.5, a very elementary proof of the following fact is 
presented: Let $ be a symmetric function of n variables which has continuous derivatives 
of the first order. Assume that the condition (15.161) is satisfied. If a sequence x = 
(xi, ... , Xn) of real numbers majorizes a sequence y = {yi, . . . , yn), then the inequality 
Km holds. 

The last result combined with the Hardy-Littlewood-Polya theorem that was discussed 
earlier yields an independent proof of the fact that a symmetric function $ that satisfies 
the condition (15.161) is S-convex. 

The theorem by G. Birkhoff sheds light on geometric aspects of majorization and Schur 
averaging. It is clear that the set of all doubly-stochastic matrices is compact and convex. 
Therefore, it is of interest to find the extreme points of this set. It is clear that permutation 
matrices are doubly-stochastic and that they are extreme points. It turns out that they 
are the only extreme points. 

THEOREM (G. Birkhoff). Every doubly-stochastic matrix M = [^jfc]]^<jfc<„ repre- 
sentable as a convex combination of permutation matrices: 



where n runs over the set «S„ of all permutations of the set {1, ... ,n}, are the cor- 
responding permutation matrices (I5.14p . and the coefficients X-j^ = A^(M) satisfy the 
conditions 



REMARK 2. In general, the coefficients Xt^{M) in the representation (I5.20p are not 



108): 




(5.20) 




(5.21) 
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uniquely determined from the matrix M. 



This theorem was formulated and proved in 1946 in the paper |Birklj . (This formula- 
tion also appeared in Example 4* in }Birk2] . p. 266.) The original proof due to Birkhoff is 
based on a theorem by Ph. Hall on representatives of subsets, |HalPj . (The latter theorem 
can also be found in [HalMj . sec. 5.1). G.B.Dantzig [Dan] gives an algorithm for solving 
a transportation problem, the solution of which leads to Birkhoff's theorem. An inde- 
pendent proof of Birkhoff's theorem was given by J. von Neumann |NeuJlj in the setting 
of game theory. "Combinatorial" proofs of Birkhoff's theorem (based on Ph. Hall's the- 
orem), are presented in the books of M.Hall [HalMj (see Theorem 5.1.9), and C. Berge 
[Berj (see Theorem 11 in Chapt. 10). A geometric proof (based on a direct investigation of 
extreme points) is presented in |Ho Jo] . Theorem 8.7.1. Two different proofs of Birkhoff's 
theorem are presented in [MaOlj . Chapt. 2, Sect. F. The paper [Mir] is a good survey 
of doubly-stochastic matrices. In particular, it contains a proof of Birkhoff's theorem. 
See also the problem book by I.M. Glazman and Yu.I. Lyubich |GlLy| , Ch. 7, §4, where 
Birkhoff's theorem is presented in problem form. 

Let X = {xi, . . . ,Xn) be a sequence of real numbers and, for a permutation vr of the 
set {1, ... ,n}, let a;^ = (a;jr(i), . . . ,X7r(n))- (Thus, for given x there are n! sequences a;^, 
some of which can coincide.) We consider these sequences as vectors in M". Let denote 
the convex hull of all the vectors Xt^ where vr G «S„. 

THEOREM (R. Rado) Let x = {xi, . . . , Xn) and y = {yi, . . . , yn) be two sequences of real 
numbers. Then 

y eC^ y -<x. 

PROOF. The implication ^ is easy. The converse can be obtained by combining the 
cited theorems of Hardy- Littlewood-Polya and Birkhoff. □ 

This theorem seems to have been established first by R. Rado [Radj . His proof was based 
on a theorem on the separation of convex sets by hyperplanes. A. Horn ( [HorAlj . Theorem 
2) observed it can also be obtained by combining the results of Hardy-Littlewood-Polya 
and Birkhoff that were cited earlier. A short proof of Rado's theorem, which does not use 
the Birkhoff theorem, can be found in [Mark] (see Theorem 1.1). 

The circle of ideas related to Schur averaging, majorization and Birkhoff's theorem is 
well represented in the literature. The whole book |Ma01j (of more than 550 pages) is 
dedicated to this circle. It includes applications to combinatorial analysis, matrix theory, 
numerical analysis and statistics. The books [ArnBj and |PPTj are also relevant. There 
are generalizations of Birkhoff's theorem to the infinite dimensional case, see |Mir] and 

One generalization of Birkhoff's theorem leads to an interpolation theorem for linear 
operators. Let B be the linear space R" provided with a norm || . \\b such that ||a;||B = 
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IIoj^IIb for every x G M" and for every permutation vr G «S„, where, as usual, aj^r = 
. . . ,a;,r(n))- In other words, this property of the norm || . \\b can be expressed as 
IIPttIIb^b = 1 for every permutation n E Sn where the permutation operator is defined 
by the permutation matrix P^^, fl5.14l) . in the natural basis of the space R". A norm || . \\b 
with this property is said to be a symmetric norm. A Banach space B with a symmetric 
norm is said to be a symmetric Banach space. 

Let an operator A in the space M" be defined by its matrix A = [ajk\ ]^<^<„ in the natural 
basis of the space M" and assume that it satisfies the norm estimates 

< 1 and ||y4|| <1. (5.22) 

Then, as noted earlier in Section HJ 

\ajk\ < 1, 1 < k <n and \ajk\ < 1, I < j <n. (5.23) 

l<j<n l<k<n 

According to one generalization of Birkhoff 's theorem, a matrix A satisfying the conditions 
(15.231) admits a representation of the form 

where the A^^ are real (not necessarily non-negative) numbers satisfying the conditions 

Therefore, since UPttHb^b = 1, the operator A must be a contraction in this norm: 

\\A\\b^b<1. (5.24) 

Thus, the following result holds: 

THEOREM (Interpolation theorem for symmetric Banach spaces). Let an operator A 
acting in the space be a contraction in the and norms, i.e., let the estimates 
(15.221) hold. Then the operator A is a contraction in every symmetric norm || ■ ||s oii M", 
i.e., the estimate (15.241) holds. 

Here we presented the simplest interpolation result for symmetric spaces. A more ad- 
vanced result can be found in |Mit j . Thus, the development of ideas initiated by Schur 
leads to interpolation theorems for Banach spaces with symmetric norms. 

The last topic which we discuss here is the Schur-Horn convexity theorem. A. Horn 
( [HorAlj . Theorem 4) obtained the following strengthening of the second part of the 
Hardy-Littlewood-Polya theorem: 
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THEOREM (A. Horn). Let a; = (xi, . . . , x„ and y = {yi, . . . , yn) be any two points in 
such that y ~< X. Then there exists an ortho-stochastic matrix M such that y = Mx . 

The following result is a direct consequence of the cited theorems of Rado and A. Horn: 

Given x = (xi, . . . , Xn) G M", the following two sets are coincide: 

1. The set Cx == the convex hull of the family of vectors {Xtj}t^^s^ . 

2. The set {Mx}, where M runs over the set of all ortho-stochastic matrices. 

In view of the relations (15.21) and (15. 3p . the last statement can be reformulated in terms 
of eigenvalues and diagonal entries: Let us associate with every real symmetric n x n 
matrix H = [^ifc]i<j-^<„ the n-tuple h{H) = {hu, . . . , /i„„) of its diagonal entries and 
the ri-tuple uj{H) = {u!i{H), ... ,u!n{H)) of its eigenvalues arranged in non-increasing 
order: uJi{H) >, ... , > ujn{H). We consider these n-tuples as vectors in M". Given an n- 
tuple ijj = (tui, . . . ,u!n) of real numbers, arranged in non-increasing order: Ui > ... ,uJn, 
let 

Tii^ = |iJ : is real symmetric and uj{H) = a;} . 

THEOREM (Schur-Horn convexity theorem). Given an n-tuple u: = {ui, . . . ,ujn) of real 
numbers: Ui > . . . , > uJn, the set |/i(iJ)}^^^ of all "diagonals" of matrices from Ti^^ 
is convex. Moreover, 

{M^)W=^- (5-25) 
where C^j, is the convex hull of the family of n\ vectors lj^ = (ti;7r(i)5 • • • ,^n{n))7 as vr runs 
over the set Cn of all permutations of the set {1, ... ,n}: 

= Conv {oj^iTc e Sn}. (5.26) 

Schur himself established the formula 

{h.(if)}^^^ = {Muj : M is ortho-stochastic}. 

He did not described the set on the right geometrically as a convex hull. The term "convex 
set" does not appear in the paper [SchlSj ) at all. The "Schur-Horn convexity theorem" 
appeared only in the paper by A. Horn |HorA2j ( which used in an essential way the cited 
results by Hardy-Littlewood-Polya and Birkhoff.) However, the influence of Issai Schur 
on the area was so great that the term "Schur-Horn convexity theorem" is now common. 

In the last thirty years, the Schur-Horn convexity theorem has been generalized signifi- 
cantly. In 1973 (fifty years after the publication of |Schl8] ) B. Kostant published a seminal 
paper [Kosj in which he interpreted the Schur-Horn result as a property af adjoint orbits of 
the unitary group and generalized it to arbitrary compact Lie groups. More precisely, he 
proved (see especially |Kosj . sect. 8) that for an element a: in a maximal abelian subspace 
t in the Lie algebra t of a compact Lie group K one has 

pri{AdK.x) = ConvW.a;, 
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where prt : t — t is the orthogonal projection (with respect to the Kilhng form) and W 
is the Weyl group associated with the pair (tck:)- Subsequently, M.F. Atiyah |Atij and, 
independently, V. Guillemin and S.Sternberg [GuStlj . |GuSt2j gave an interpretation of 
Kostant's theorem as a special case of a theorem on the image of the momentum map 
of a Hamiltonian torus action. Atiyah's proofs depend on some ideas from Morse theory. 
Subsequently, the results of Kostant, Atiyah, Guillemin and Sternberg were extended to 
the setting of symmetric spaces. See, for example, the paper [HNPj , where more references 



can be found, the paper [BFRj and the book |Hi01] . sections 4.3 and 5.5. 

In yet another direction, the relevance of doubly-stochastic matrices and Schur averaging 
to operator algebras and quantum physics is discussed in the book [AlUj . 

Thus, once again a relatively short paper of Issai Schur is seen to have had significant 
influence on the development of a number of diverse areas of mathematics. In particular, 
[SchlSj paved the way to important results in matrix theory, statistics, the theory of Lie 
groups and symmetric spaces, symplectic geometry and Hamiltonian mechanics. Many of 
these areas are very far from the original setting. 
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6 . Inequalities between the eigenvalues and the singular 
values of a linear operator. 

Let A = [ajk\ be an tt, x n matrix with eigenvalues Ai, . . . , A„ G C. In Theorem 

II of |Sch2j . Schur proved the inequality 

n n 

J]|A,P< J2 M'- (6-1) 

i=i j,k=i 

Schur's proof was based on Theorem I of that paper, in which he established the fun- 
damental fact that every square matrix A with complex entries is unitarily equivalent to 
an upper triangular matrix, i.e., there exists a unitary matrix U such that 

T = U*AU = U-^AU (6.2) 

is upper triangular: t^^ = for j > k. Therefore, the set of eigenvalues of the matrix 
A is equal to the set of eigenvalues of the matrix T, which in turn is equal to the set of 
diagonal entries of T. Thus, 

n n n n 

= \tjj\'^ < \tjk\'^ = trace T*T = trace A*A = l^jfcT- 

i=l j=l j,k=l j,k=l 
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Apart from its use in the proof of the inequahty (16.11) . Theorem I serves as a model for 
some important constructions in operator theory that will be discussed below. 

In |Sch2j . Schur used (16. ip to obtain simple proofs of the estimates 

\Xi\ < n ■ max \ajk\ < I < n) (6.3) 

l<j,k<n 

iRe Ai| < n ■ max \bjk\ and llmA/l < n ■ max \cjk\ (1 < ^ < ?t-) , (6-4) 

l<j,k<n ^1^j,k<n 

for the eigenvalues A^ of a general n x n matrix A = [djk] , where, B = [bjk] = {A + 
A*)/2 and C = [cjk] = {A-A*)/{2i) . The estimates f lO) were first obtained by A.Hirsch 
[Hir j . They were improved to 



I I n(n — 1) 

ImA/ < \ ■ max \cjk\ (1 < t < n) (6.5) 

for real matrices Ahj F. Bendixson |Bend] and reproved in |Sch2j . In §7 of ( |Sch2] ). the 
interesting inequality 

|Aj - AfcP < ^ \ajj - ttfcfcp + kjfcP (6.6) 

j<k j<k jytk 

is derived and then used to obtain the following estimate for the discriminant 

j,k 

of the characteristic equation det (A/„ — A) = 0: 

2 2 2 
\d\^^ < — V \ajj - atk? + Y] \ajk? • (6.7) 

^ ' j<k j^k 

In §5 of |Sch2j . the well known Hadamard bound 

|detA|<( max |ajfe| )" ■ n"/^ (6.8) 

1 ^ J ^ Th 

on the maximal value of the determinant of a matrix is derived from the inequality (16.11) 
with the help of the inequality between the geometric and the arithmetic means: 

detAP = |AiP |AJ2 < ( M±i:i±My < ^nfc=il%^l 



n J \ n 



The challenge of obtaining simple new proofs of various Hadamard inequalities seems to 
have been one of Issai Schur's favorite occupations. 
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In |Sch2j . Schur also considers integral operators x{t) {Kx)(t) in L'^{a,b), 



{Kx){t) = J K{t, r)x(r) dr {a<t<b) 



(6.9) 



with kernels K{t, r) that satisfy the condition 

b b 

\K{t,T)\'^ dtdr < oo. 



(6.10) 



Today, such operators are commonly called Hilbert- Schmidt integral operators. Schur 
extended the inequality (16.11) to these operators: 



b b 



J2\MK)\'< // \Kit,T)\'dtdT, 



(6.11) 



where the summation on the left hand side is extended over the set of all eigenvalues 
Xi{K) of the integral operator K. In particular, the series on the left hand side of (16.111) 
converges. 

One of the fundamental results of the Fredholm theory of integral equations [Fredj is 
the identification of the nonzero eigenvalues Xi{K) of an integral operator ( 16.90 with 
a continuous kernel as the reciprocals of the zeros of an entire function -Dj<:(A) (that 
is constructed from the kernel K{t,T) of this operator). This function is termed the 
Fredholm denominator (or the Fredholm determinant) of the operator 116.9]) . It is defined 
by the Taylor series 



^ic(A) = X^c„Ar 



(6.12) 



n=0 



with coefficients 



det 



a a 



K(tn ; tn) 



dti ■ ■ ■ dtr, 



(6.13) 



From (16.131) and the Hadamard inequality (16. Sp . it follows that if 



(T=(6 — a)(max \K{t, t)\) < oo , 

a<t,T<b 



then 



|c„| < a" ■ n"/V^' • (6-14) 

Consequently, the series (I6.12p converges for every complex A, and its sum /^^(A) is an 
entire function that is subject to the bound 



ln|/^K(A)| < (t'|A|'(1 + o(1)) (|A|^oo) 



(6.15) 
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Thus, the counting function of the zeros fii, fi2, ... of Dk{^)- 

riKir) = i^{f^i{K) : < r} = #{A,(K) : MK)]-' < r} , 

satisfies the condition 

n-Kir) = 0{r'^), as r ^ oo . (6.16) 

The estimates fl6.15l) and its consequence (16.161) were known [Lalj before the Schur paper 

|Sch2] appeared. However, the estimate (16. lip is stronger than the estimate (I6.16p . From 

the convergence of the series ^ \Xi{K)\'^ and from the estimate (16.150 it follows that the 

I 

Fredholm denominator (I6.12p - (l6.13p admits the multiplicative decomposition 

I 

for some choice of constants c and d. The fact that the Fredholm denominator of the 
integral operator (16. 9p with a continuous kernel admits a representation of the form (16.170 
was first noted by Schur in § 14 of |Sch2] . (It is important to note that the kernel K{t, r) 
is not assumed to be symmetric or Hermitian.) This result of Schur is sharp in the sense 
that there exists a continuous kernel i^' on a finite interval [a, h] whose eigenvalues satisfies 
the condition 

|A^(K)p"' = oo for every e > 0. 

To construct an example, let K{t, r) = Lp{t — r) for < t, r < 1, where (^{t + 1) = ip{t) 
is a continuous periodic function on M with Fourier expansion ip{t) ~ ^jQc^'^*^*. Then 
the functions e^'^*^* are eigenfunctions of the kernel and the Fourier coefficients q are 
eigenvalues of this kernel. A kernel with the desired properties is obtained by choos- 
ing a continuous periodic function ip whose Fourier coefficients q satisfy the condition 
= oo for every e > 0. The first example of such a function was constructed by 
T. Carleman |Carl2j . Other examples can be found in [Bar], Chapt.4, §16, or in |Zyg| , 
Chapt.5., (4.9). In his first publication |Carll] . Carleman proved that in fact (i = in 
(16.170 . Thus, the scientific career of this outstanding analyst started with an improvement 
of a result of Issai Schur. 

The inequality (16. ip can also be presented in the form 

n n 

Y.\MA)\' <Y.sU? . (6.18) 

1=1 1=1 

where the \i{A) are the eigenvalues of the matrix A and the numbers si{A) are the singular 
values of A. 

The auxiliary inequality 

n n 

Y.\\i{A)\<Y,si{A) (6.19) 

Z=l 1=1 
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can also be provedlj in an elementary way by using the Schur transformation f l6.2p to 
reduce the matrix A to upper triangular form. In fact, it suffices to prove fl6.19p for 
upper triangular matrices A, since the transformation fl6.2p does not change either the 
eigenvalues or the singular values of the matrix. But then, if {e;}i<K„ is the natural basis 
of the space C", 

ail = {Aei, ei) = \i{A), I = 1, ... ,n, 

up to a reindexing of the eigenvalues, if need be. Now let ^ = 5 ■ ^ be the polar 
decomposition of the matrix A: S > 0,V*V = VV* = In and let hi = Vei. Then the 
vectors {hi}i<i<n form an orthonormal basis of the space C" and, by the Cauchy-Schwarz 
inequality. 



\{Aei,ei)\ = \{Shi,ei)\ < ^ {Shi, hi) ■ ^/{S^ . 



Therefore, 



1=1 1=1 



Y,{shi,hi) 



\ 1 = 1 \ 1 = 1 



1=1 



since 



^{Shi, hi) = ^{Sei, ei) = trace 5" = ^ MS), 



1=1 



and, by the definition of singular values, {\i{S)}]^^-^ = {si(y4)}[L^. 

The inequalities (16. ip . written in the form (16.180 . and (I6.19P were significantly general- 
ized by H. Weyl |Wey| in 1949. The generalization is based on the concept of majorization 
that was discussed in the previous section. A crucial role is played by the inequalities 

I Ai(A) ■ \2iA) Afc(A) I < s,iA) ■ S2iA) SkiA) (A; = 1, 2, . . . , n - 1) , (6.20) 

which are valid when the eigenvalues Xk{A) and the singular values |sa;(A)| are indexed in 
such a way that \Xi{A)\ > \X2{A)\ > ■ ■ ■> \Xn{A)\ and si{A) > S2{A) > ■ ■ ■ > Sn{A). 
The equality 

I Xi{A) ■ X2{A) Xn{A) I = si{A) ■ S2{A) s„(A) (6.21) 

holds because both sides are equal to |det A|. The relations (I6.20p and (I6.2ip mean that 
the sequence {In |Afc(y4)|}^^^ is majorized by the sequence {In Sk{A)}^^^: 

{ln|A,(A)|}Li^{ln.fc(A)}Li. (6.22) 
In |Wey| , Weyl derived the inequalities (I6.20p and then applied the inequality 

n n 

J^V-d/fc) <I]^(a;fc), (6.23) 

k=l k=l 



^This proof is adopted from |GoKrlj . Chapt.IV, §8. See Theorem 8.1, especially the footnote 7 on 
p. 128 of the Russian original or on p. 98 of the English translation. 
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which holds for any convex function on (—00, 00) and any pair of sequences {i/k} 

and {xk} such that {i/k} -< {xk}, to the sequences i/k = ln\Xk{A)\ and Xk = In Sk{A). 
The inequahty (16.231) is a direct consequence of the resuh0 by Schur (which states that 
the inequahty (I6.23P holds for sequences x and y = Mx that are related by a doubly- 
stochastic matrix M), and of the result^ by Hardy, Littlewood and Polya, who proved 
that 

y ~< X =^ y = Mx 

for some doubly-stochastic matrix M. However, Weyl was not aware of these results and 
gave an independent proof of the implication 

y ^x =^ (Km (6.24) 

in Lemma 1 of |Wey] . The inequalities (I6.20p were known before the paper |Wey| was 
published. (See, for example. Exercise 17 on page 110 of the book |TuAi] .) However, it 
was Hermann Weyl who first combined the inequalities (16.201) with the implication (16.240 
to obtain the following 

THEOREM. Let A be an n x n matrix with eigenvalues {Xk{A)}^^^ and singular values 
{sk{A)}^^^ (counting multiplicities) and let </?(■) be a function on (0, 00) such that the 
function ipit) = ip{e^) is convex on {—00, 00). Then 

n n 

Y,vMA))<Y,visk{A)). (6.25) 

k=l k=l 



Weyl invoked the inequality (16.251) with (f{t) = and p > to obtain the following 
generalization of Schur's inequality (I6.18P : 

n n 

5^|AKA)r <5Zs,(A)P (0<p<oo). (6.26) 
1=1 1=1 

Analogous inequalities hold for linear operators A in a Hilbert space Ti. that belong to the 
class &p, i.e., for which si{Ay < 00 , where the si{A) are the eigenvalues of the operator 
\/A*A. (Usually the singular values si{A) are enumerated by the indices Z = 0, 1, 2, ....) 
The summation in the last inequality is then extended over all eigenvalues and over all 
singular values of the operator A. The resulting inequality is very useful in the theory 
of integral equations. The point is that it is difficult to calculate the eigenvalues and 
singular values of an integral operator in terms of its kernel. However, the singular values 
can be effectively estimated from above by approximating the kernel K{t, r) by degenerate 

n 

kernels of the form Kn{t, t) = ^ ipi{t)ipi{T) and invoking the fact that 

1=1 

Sn{K) = iYli\\K-Kn\\ 

^These results were discussed in the previous section; see Theorems I, II and I'. 
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as Kn{t, t) runs over the set of all degenerate kernels of the indicated form. (See |GoKrlj . 
Chapt. 2, § 2, item 3.) The smoother the kernel K, the more rapid the rate of decay of the 
sequence \\K — Kn\\ and thus, the rate of decay of the sequence Isnli')!. The inequality 

oo 

5^|A,(K)r<5^.,(^r (6.27) 

I 1=0 

is then used to derive the rate of decay of the eigenvalues Xn{K). This is the "modern" 
way to derive the rate of decay of the eigenvalues Xn{K) of an integral operator from 
the smoothness of its kernel K{t, r). The theory of spline approximation is often used to 
construct good approximating kernels. (See, for example, the papers by M. Sh. Birman 
and M.Z. Solomyak mentioned in Section 4.) 

The "classical" approach, which does not exploit the Weyl inequalities, is more com- 
plicated and gives weaker results. Chang, in a paper |Chang| that appeared before the 
paper |Wey| , proved that 

oo 

J^siiKf <oo =^ J2\MKT<oo (6.28) 

1=0 I 

for integral operators (16.91) of Hilbert-Schmidt class, i.e., with kernels K{t,T) satisfying 
the condition (16.101) . The "classical" methods of the paper Chang| are involved and 
rather difficult. 

The Weyl inequalities are also useful in "abstract" operator theory. Taking ip{t) = 
ln(l + |A|t) (which is admissible, since the function ilj{t) = ln(l + |A|e*) is convex), one 
can obtain the inequality 

I n (1 - ^ M^)) I < n (1 + 1^1 ^'(^)) ^ ^) ' (6.29) 

for linear operators A from the class ©i of trace class operators in a Hilbert space. This 
inequahty is useful in the study of the so-called characteristic determinants of trace class 
operators and related analytic considerations. (See Chapter IV of |GoKrlj .) In particular, 
the inequality (I6.29P plays an important role in the proof of a theorem by V.B. Lidskii, 
which states that the matricial trace and the spectral trace of a trace class operator 
coincide. (See [Lidj . and |GoKrlj . Chapt. Ill, §8, Theorem 8.1.) This theorem is of 
principal importance in operator theory. 

The Weyl inequality (I6.25P is one of the central tools in the toolbox of modern operator 
theory. However, as Weyl himself wrote |Wey| , the first step was taken by Schur: 

"Long ago I. Schur proved (16.261) /or^ p = 1. Recently S.H. Chang showed in his thesis 
that, in the case of integral equations, the convergence of ^ sf implies the convergence of 



^ This reference by Weyl is not accurate. Schur proved the inequahty (j6.26p for p = 2, but not for 
p=l. 
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^ |Aip. These two facts led me to conjecture the relation fl6.26p . at least for p < 1. After 
having conceived a simple idea for the proof, I discussed the matter with C.L. Siegel and 
J. von Neumann; their remarks have contributed to the final form and generality in which 
the results are presented here. " 

Thus, the paper |Sch2j served as source of inspiration for both T. Carleman and H. Weyl. 
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7. Triangular representations of matrices 
and linear operators. 

One of the important theorems of Schur that was discussed in the preceding section 
states that every square matrix is unitarily equivalent to a triangular matrix. In the 
early fifties, stimulated by this theorem of Schur, Moshe (Mikhail Samuilovich) Livsic 
( = Livshits ) obtained an analogue of this result for a class of bounded linear operators 
in a separable Hilbert space. 

To explain his results, let us first recall that every bounded operator A in a Hilbert 
space Ti. is representable in the form 

A = BA + tCA, (T.l) 

where 

BA = ReA=^-^ ={BAr and Ca = ImA = = {CaY ■ (7.2) 

Livsic obtained his conclusions in the class iQ of bounded linear operators A for which 
Ca is of of trace class. In the simplest case of this setting, 

rank(C^) = 1 , (7.3) 
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and hence Ca must be be definite: either Ca > 0, or Ca < 0. Thus, 



Ca=3\CaI where \Ca\ = ^{CaYCa, j = +1 or j = -1 (7.4) 

and the imaginary parts Pk of the eigenvalues = + iPk of the operator A are of the 
form Pk = j\Pk\- 

Without loss of generahty, we may assume that the the operator A is completely non- 
selfadjoint: There is no invariant subspace for the operator A on which A induces a 
self-adjoint operator. Indeed, if the operator A is not completely non-selfadjoint, then it 
splits into the orthogonal sum A = S" © A ens, where S* is a selfadjoint operator and A ens 
is a completely non-selfadjoint operator. Moreover, ImA = ImAj 



cns' 



The eigenvalues Xk = ak + iPk of a completely non-selfadjoint operator A with non- 
negative (non-positive) imaginary part are never real: either jSk > for all k (if Ca > 0), 
OT (3k<0 for all k {if Ca < 0). 

The triangular model T for an operator A satisfying the condition fl7.3p acts in the model 
Hilbert space T^mod = l'^®L'^ that is the orthogonal sum of the space of square summable 
one-sided infinite sequences (^i, ^2, • • • , ) of complex numbers of dimension n < 00, where 
n is equal to the number of eigenvalues of the operator A (counting multiplicities), and 
is the space of all square summable complex- valued functions on a finite interval [0, Z], 
where the number / is determined uniquely by the operator A. The spaces and are 
equipped with the standard scalar products. The block decomposition of the operator T 
that corresponds to the decomposition Timod = P ® L"^ of the space Timod, is of the form 



dis cou 
T 



(7.5) 

where Tdis : ^ l\ T.^n : L'^ and T.^u : L'^ ^ l\ 



The operator T^is, the discrete part of the operator T, is defined by its matrix [tkm] in 
the natural basis of the space This matrix is upper triangular, i.e., with j as in f l7.4p . 

tkm = for k>m, tkk = h, tkm = i\Pk\^''^ j \Pm\^^'^ for k<m. (7.6) 
The operator Tjis is bounded, since A is bounded and \Pk\ < trace |Ca| < 00. The 

k 

operator Tcon, the continuous part of the operator T, is an integral operator of the form 

I 

(TeonO W = Hmt) + ^ J K{t, s) e(s) ds 0<t<l, (7.7) 

t 

where X{t) is a non-decreasing bounded real- valued function on the interval [0, /] which is 
determined by the operator A. The kernel K{t, s) of the integral operator (17.71) is of the 
form 

K{t, s) = for 0<s<t<l, Kit, s) = ij for 0<t<s<l, (7.8) 
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i.e., the operator (17.71) can be considered as upper triangular. The summand X{t)^{t) 
corresponds to the "main diagonal" of this operator. For the operator Tnou, the so called 
coupling operator, an explicit formula can be obtained. Thus, the wholqj operator T can 
be naturally considered as an upper triangular operator. 

DEFINITION. Let A be an operator which acts in a Hilbert space Ti. An operator A acting 
in a larger Hilbert space H , TC ^ TC, is said to be an inessential extension of the operator 
A, if A = A Q) S, where S is a selfadjoint operator acting in the space TiQTi. 

THEOREM I' (M. Livsic). Let A be a bounded completely non- selfadjoint linear operator 
in a Hilbert space Ti such that Ca is one-dimensional. Then there exists an inessential 
extension A : Ti ^ Ti. of the operator A that is unitarily equivalent to an "upper triangu- 
lar" model operator T of the form (17. 5p : There exists a unitary operator U acting from 
'Hniod = l"^ ® L"^ onto H such that 

T = U*AU = U-^AU . (7.9) 



Triangular models of the same general form (I7.5p -( rr6l) - (l7.7p can also be constructed for 
bounded linear operators A in a separable Hilbert space H when Ca is only assumed to 
be of trace class. They are, however, a bit more complicated. 

For an operator A in a separable Hilbert space Ti, let us introduce the non-hermitian 
subspace Ma as the closure of the image of its imaginary part Ca- 

Ua = C7H. (7.10) 

The dimension of the non-hermitian subspace Ma is said to be the non-hermitian rank 
of the operator A: 

nyi = dimA/'A. (7.11) 

The restriction Ca\ma operator Ca on the subspace Ma, considered as an operator in 

the Hilbert space A/a, is a selfadjoint operator for which the point {0} is not an eigenvalue. 
Therefore, the polar decomposition of this operator is of the form 

CaUa = J A ■ Ma, (7.12) 

where 

Ja-Ma^ Ma, J a = Jl, J A = Ima and Ma'.Ma^ Ma, Ma > 0. (7.13) 
(In this polar decomposition, Ja is the unitary operator and Ma is the operator modulus.) 

^The operator T will not contain a discrete part Tdis if the operator A has no eigenvalues. It will not 
contain a continuous part Tcon if ? = 0. 
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To construct the triangular model of the operator A, let us choose a Hilbert space S of 
the same dimension as the non-hermitian subspace AfA'- dimS = dimAOi. Let be an 
operator in £, 

Js-.S^S, Je = Je*, Je^ = h, (7.14) 

of the same signature^ as that of the operator J a. For the sake of brevity, we shall restrict 
our attention to the case of operators with real spectrum only. The model space Timod 
in this case is the space i^|([0, /]), i.e., the space of all square-integrable functions on a 
finite interval [0, /] C M, whose values are elements of the Hilbert space with the scalar 
product: 

I 

(^'^)h... = / m.medt, for C = e(t),r7 = r^(t)G4([0,/]. 



The model operator T acts in the space Timod according the rule 

I 

(TO(t) = mm + 1 j n{t)j,n{sn{s) ds, (7.15) 

t 

where A(t) is a non-decreasing real-valued function on the interval [0, I] and n(t) is a 
function on the interval [0, /] whose values are Hilbert-Schmidt operators in S that satisfy 
the normalization condition 

trace^n(t)*n(t) = 1 , < t < / . (7.16) 



THEOREM \" (M. Livsic). Let A be a bounded completely non- self adjoint linear operator 
in a Hilbert space Ti such that Ca is of trace class and the spectrum of A is real. Then 
there exists an inessential extension A : Ti ^ Ti, Ti ^ Ti, of the operator A that is 
unitarily equivalent to an "upper triangular" model operator T of the form f l7.15p : There 
exists a unitary operator U acting from Hmod = -^^([O) ^]) onto TC such that 

T = U*AU = U-^AU . (7.17) 



REMARK. Direct calculation shows that 

I 

((T-T*)e)(t)=z J nit)J^nisrm)ds (7.18) 



^The spectrum of an operator J which posses the properties J = J* , ,P = I can consist of the points 
{+1} and {—1} only. These points are eigenvalues of J. Let pj and qj denote the dimensions of the 
corresponding eigenspaces, < pj, qj < oo. The signature of the operator J is the pair (pj, qj). 
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Thus, the model operator fl7.15p is of the form 



= Ximt) + 22 y" X{t, s)H{t, s)as) ds , (7.19) 



where 

xit, s) = 1 for s > t, x(^) s) = for s < t , 
and the kernel H{t, s) = Il{t)J^Il{sy represents the imaginary part of the operator T: 



I 

((T - T*)e) (t) = 2tj Hit, s)i{s) ds . (7.20) 



In other words, the kernel K{t, s) that represents the operator T can he obtained from 
the kernel H{t, s) that represents the imaginary part of T, by means of " truncation to 
the upper triangle": K{t, s) = xit, s)iJ(t, s). 

For operators whose imaginary part is of trace class but whose spectrum is not necessary 
real, the triangular model has a more complicated form; see e.g., [Liv5j and in |BroLij . 

Moshe Livsic introduced the machinery of characteristic functions of linear operators in 
the mid forties in order to solve a number of problems connected with theory of extensions 
of linear operators, see |Livlj and |Liv2j . He then applied this machinery to establish the 
unitary equivalence of an operator of the class ifl to a triangular model in the early fifties. 
See |Liv3] . [Lrvi] for the first results and |Liv5j for a detailed presentation. 

The characteristic function of a non-self adjoint linear operator A acting in a Hilbert 
space Ti. is defined as follows: Choose a Hilbert space S of the same dimension as the 
non-hermitian subspace Ca Ti of the operator A and then factor the operator Ca in the 
form 

Ca = rj,r , (7.21) 

where F, are linear operators, 

T : S ^ H , : £ ^ £, = (/^ denotes the identity operator in £). (7.22) 

The characteristic function PFyi(z) of the operator A is the operator valued function of 
the complex variable z that is defined for z out of the spectrum of A by the rule 

WAiz) = I,+2iT*izI -Ay^TJ^. (7.23) 

Notice that WAiz) acts in the Hilbert space £, which, in many problems of interest, is 
a finite dimensional space. In Livsic's terminology, the space £ is said to be the channel 
space and the operator F is said to be the channel operator. 
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Livsic showed that the characteristic function is a unitary invariant of a completely 
non-selfadjoint operator: Let Ai and be two completely non- self adjoint operators such 
that their characteristic functions Wai{z) and Wa2{z) (with the same channel space S) 
are equal: Wai{z) = Wa2{z). Then the operators Ai and A2 are unitarily equivalent. To 
reduce a non-selfadjoint operator to triangular form, Livsic calculated its characteristic 
function Wa{z) and then constructed a model operator T in such a way that its character- 
istic function Wt{z) coincides with Wa^z). Subsequently, triangular models of operators 
were partially superceded by functional models, see |SzNFo] . [Bran], |NiVal . 

There is also a very important correspondence between the invariant subspaces of an 
operator A and certain divisors of its characteristic function. However, the importance 
of the notion of a characteristic function is not confined to its applications in operator 
theory. Livsic related the theory of stationary linear dynamical systems to the theory 
of linear non-selfadjoint operators and showed that the characteristic matrix function of 
a linear operator that serves as an "inner operator" for the dynamical system can be 
identified with the scattering matrix of this system. Examples are furnished in 
|Liv7j and [BroLij . A detailed presentation of the early stages]^ of the theory of open 
systems (as Livsic termed them) can be found in |Liv8j . In particular, as he noted in 
the first sentence of Section 2.2 of that source: "The resolution of a system into a chain 
of elementary systems is closely related to the reduction of the operator ... to triangular 
form." 

The results of Livsic on reducing operators to triangular form are similar in form to 
the result of Schur. However, the methods that he used are absolutely different from the 
method of Schur. Schur 's result implies that there exists an orthonormal basis ei, ... , e„ 
of the space C" such that the given matrix A is upper-triangular in this basis. Thus, if 

Ho = 0, TYfc = span{ei, . . . ,efe} , fc = l, 2, ...,n, (7.24) 

then this collection {'^fc}Q<^,<„ of subspaces of C" possesses the following properties: 

i. o = nocnicn2c ... c = c" , 

ii. dim {Hk e Hk-i) = 1 , (7.25) 

iii. Every subspace Tik is invariant for the operator A . 

Conversely, let an operator A in C" and a collection of subspaces {^fc}o<fc<n satisfying 
the conditions (17.251 i)- (l7.25[ iii) be given, and let G TCk Ti-k-i be unit vectors for 
k = 1, 2, . . . ,n . Then the set of vectors {e^} forms an orthonormal basis of C" and the 
matrix of the operator A in this basis is upper-triangular. It turns out that this strategy 

*For more information on the characteristic function of a linear operator, see also the M.S. Livsic 
Anniversary Volume |OTSTRj . in particular, the Preface and the paper [Kats| . 

^ A more elaborate presentation of scattering theory for linear stationary dynamical systems (with 
emphasis on applications to the wave equation in K") was carried out in jLaPhij . 
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can be adapted to obtain analogues of Schur's theorem in infinite dimensional Hilbert 
spaces. The first step in this direction was taken by L.A. Sakhnovich [Sakhlj who noticed 
that although the proof of Schur is based on the fact that every operator A in a finite 
dimensional linear space over C has an eigenvector, that really the proof only depended 
upon following property of the operator A: 

Property I S . For every pair of closed invariant subspaces Hi and H2 of the operator A 
such that Hi C H2 and dim {H2 Q Hi) > 1, there exists a third closed invariant subspace 
H3 of the operator A such that Hi C H3 C H2 , H3 Hi, H3 H2- 

A theorem of J. von Neumann (unpublished) and of N. Aronszajn and K. Smith [ArSmj . 
guarantees that every compact operator in a Hilbert space possesses the property I S. 
In [Sakhlj . Sakhnovich proved that if the imaginary part Ca of the operator A is of 
Hilbert-Schmidt class, i.e., if 



k 

then the operator A possesses the property I S . From later results of V.I. Matsaev it follows 
that this condition can be relaxed: if 



then A possesses the property I S. 

Let H he a Hilbert space and let denote the collection of orthoprojectors onto 
all possible closed subspaces of H. The set is partially ordered: Pi < P2 if the 
corresponding ranges are ordered by inclusion, i.e., if TZp^ C TZp^ , and Pi < P2 if the 
inclusion of the ranges is proper. A subset ^ of the set '^n that contains at least two 
orthoprojectors is said to be a chain if it is fully ordered, i.e., if the conditions Pi G 
<P, P2 e^, Pi^ P2 imply that either Pi < P2 , or P2 < Pi- 

If a chain ^ contains orthoprojectors P and P+ (P^ < P+) such that every orthopro- 
jector P G ^ distinct from them satisfies either the inequality P < P~ or the inequality 
P > P+, then the pair (P~, P^) is said to be a jump in the chain and the dimension 
of the subspace P^H Q P^H is said to be the dimension of the jump. A chain without 
jumps is said to be continuous. 

The set of all chains in H can be ordered by inclusion: the chain ^1 is said to precede 
the chain ^2 (and we write ^1 -< ^2) if every orthoprojector in ^1 also lies in ^2- A chain 
^ is said to be maximal with respect to this ordering if there is no chain satisfying 
the conditions ^ ^ *P ^ ^3'. 

Let A be a bounded operator in a Hilbert space H and let ^ be a chain of orthoprojectors 
in H. Then the chain ^ is said to be an eigenchain for the operator A if for every P G ^ 




(7.26) 




(7.27) 
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the subspace PTi is invariant under A, i.e., if the equahty AP = PAP holds for every 



The following result is established by transfinite induction in |Brod2] and in |Brod4j . 
where it appears as Theorem 15.2. 

THEOREM II [M.S.Livsic-M.S.Brodskii; L.A. Sakhnovich]. Let A be a bounded hnear 
operator in a Hilbert space H. that satisfies the condition IS. Then there exists a maximal 
chain of orthoprojectors that is an eigenchain for A. 

The idea for the proof of this theorem arose in a conversation between M.S. Livsic and 
M.S. Brodskii. (See the historical remark in the book |Brod4] . p. 278 of the Russian 
original or p. 234 of the English translation.) L.A. Sakhnovich gave an independent proof 
in |Sakhlj . This theorem can be considered as a first step in extending the Schur theorem 
on reducing a matrix to triangular form to the setting of a more general class of operators 
in Hilbert space. Based on it, Sakhnovich obtained the following result in jSakhl] : 

Every bounded linear operator A in a a separable Hilbert space that satisfies the condi- 
tion I S has an inessential extension A which is unitarily equivalent to an integral operator 
of the form 

1 

d 



x{t) {Kx){t) = — I Kit, s)x{s) ds , (7.28) 



t 



acting in a space -Z^^([0, 1]) of vector functions x{t) defined on the interval [0, 1] whose 
values belong to a Hilbert space £, dim£ < oo, provided with the scalar product: 

1 

{x,y)^2 = j {x{t), y{t))^dt, for x = x{t) and y = y{t) . (7.29) 



The kernel K{t, s) is a function defined for < t, s < 1 whose values are bounded linear 
operators acting in £. 

A limitation of this last result is that the class of kernels K{t, s) is not described. 
However, starting from this theorem, Sakhnovich obtained the following result in |Sakh2j : 

Every bounded operator A in a Hilbert space Ti whose spectrum is real and whose 
imaginary part Ca is of Hilbert- Schmidt class has an inessential extension A which is 
unitarily equivalent to an operator of the form 

1 

x{t) ^ H{t)x{t) + I K{t,s)x{s)ds, (7.30) 



acting in the space -Z^g ([0, 1]) of functions defined on the interval [0, 1] whose values belong 
to a Hilbert space £, provided with the scalar product fl7.29p . H{t) is a function defined 
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on [0, 1] whose values are bounded self-adjoint operators in £: Hit) = H*{t) fort G [0, 1]. 
The kernel K{t, s) is a function defined for < t, s < 1 whose values are operators acting 
in the Hilbert space £ that are of Hilbert- Schmidt class. The kernel K satisfies the 
condition 

1 

jtTeice^{{K*K){t,t)}dt<oo. (7.31) 



This result of Sakhnovich is on the one hand more general than the corresponding result 
of Livsic (because the condition fl7.26p is less restrictive than requiring Ca to be of trace 
class), but on the other hand it is less concrete, since it provides less information on the 
form of the kernel K than the other theorem. 

Further developments in this area are related to the theory of the abstract triangular 
representation of operators in a Hilbert space by means of an integral with respect to 
a chain. This integral appeared in the papers of M.S. Brodskii at the end of the fifties, 
[Brodl] , |Brod2] , |Brod3] . In a short time the theory of this new integral and its applica- 
tions were developed considerably. Important contributions to this theory were made by 
V.I. Matsaev, [Matslj . and by I.Ts. Gohberg and M.G. Krein, (GoK73] . |GoKr4] . |GoKr5j . 
The development of this theory stimulated new analytic investigations of the spectral 
properties of both selfadjoint and non-self adjoint operators. 

To explain the definition of this integral, we begin with a finite-dimensional example. 
Let 7i be a complex n-dimensional Hilbert space, n < oo. Let A be an operator in 7i, 
and let {efc}i<fc<„ be an orthonormal basis in 7i. Then the operator A can be written in 
the form 

n 

A = ^ Cj Ujk {■ ,ek) , (7.32) 
j,k=i 

where ajk = {Ack^Cj) are the entries of the matrix of the operator A in this basis. Let 
the subspaces Tik be defined by fl7.24p . and let Pk be the orthoprojector onto Tik- The 
collection ^ of the orthoprojectors 

Q = Pq<Pi< ■■■ <Pn-l<Pn = I (7.33) 

forms a maximal chain in Ti. This chain is an eigenchain for A. Let 

APfe = Pfc-Pfc_i, A: = 1, 2, ... ,n. (7.34) 

Then, since 

Pk- Pk~i = ek{- ,ek) and Cj ajk { ■ ,ek) = A Pj A APk , (7.35) 
formula fl7.32p can be written in the form 

n 

A=Y^ APjAAPk. (7.36) 

j,k=i 
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Moreover, if ajk = for some choice of j,k, then, by fl7.35p . APjAAPk = in fl4.12p . 
Thus, if the matrix ajk is upper triangular, i.e., if ajk = for j > k, and if akk = ^k (an 
eigenvalue of the matrix A), then the representation (17.361) takes the form 

n n k—l 

A = J]Afc APk + Y^Y. ^'^ ■ c^-^^) 

k=l k=2 j=l 

k-l 

Since Yl = Pk-i, (I7.37P can be rewritten in the form 

n n 

A = ^A, APfc + Pk-i A APfc . (7.38) 

fc=l k=l 

The first sum on the right hand side of fl7.38p represents the "diagonal part" of A, 
the second sum represents the "super-diagonal" part with respect to the Schur basis 
{eA;}i<fe<n- (Everything here depends on the choice of the basis.) Since the matrix of the 
adjoint operator A* (with respect to the same orthonormal basis) is lower triangular, i.e., 
APj A* APk = for j < k, and Pk-i A* P^ = 0, the Schur result can be expressed as 
follows: 



For every operator A in a finite-dimensional Hilbert space there exists at least one 
maximal eigenchain ^ = {Pk}o<k<n- For every such eigenchain, the operator A admits 
two representations: f l7.38p and (with appropriate indexing) the representation 

n n 

A = Y, ^k^Pk + 2^ 5^ Pk-i Ca APk , (7.39) 

k=l k=l 
J[ _ J[* 

where APk is defined by f l7.34p and Ca = — — — • □ 
The sums in (17.390 can be considered as "integrals" over the chain ^: 



A = j \{P)dP + 2i J PCAdP. (7.40) 

In the case of the finite-dimensional Ti. that was just discussed, the "integrals" in fl7.40p 
are no more than a notation for the finite sums in (I7.39p . It is not a problem to generalize 
integrals of the form J A(P) dP to the infinite-dimensional case. This is the usual integral 

of a scalar function with respect to an orthogonal spectral measure. Integrals of this kind 
are well understood, because of their connection with needs of the theory of selfadjoint 
operators. However, integrals of the form 

3{X, Vp) = J PXdP. (7.41) 

5P 
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for an arbitrary chain *P of orthoprojectors and a more or less general bounded linear 
operator X in an infinite dimensional Hilbert space 7i are more difficult to handle0. An 
integral of the form (17.411) can be defined by means of a very natural limiting process that 
was introduced by M.S. Brodskii["1: as usual, certain integral sums should be constructed 
and then the passage to limit should be performed. The condition 

(P+ - P-)X{P+ -p-) =0 for every jump (P - P+) of the chain <p (7.42) 

is an evident necessary condition for the existence of the integral (I7.4ip . However, the 
problem of obtaining sufficient conditions for the existence of such an integral turned out 
to be far more difficult. The theory of such integrals, the so called integral of triangular 
truncation, was created mainly in the works of M.S. Brodskii, I.Ts. Gohberg, M.G. Krein 
and V.I. Matsaev and served to complete a program that was initiated by M.S. Livsic 
(see the remark to Theorem II" of this section). A detailed exposition of this theory 
is presented in |Brod4j . |GoKr2j and ^GoGoKj . Brodskii proved that under condition 
(17.421) . the integral (I7.4ip exists, if the operator X is of trace class ©i. V.I. Matsaev, 
[Matslj . sharpened this result. He proved, that under the condition (17.421) . the integral 
(17.411) exists (in the sense of the convergence of integral sums with respect to the uniform 
operator norm), if the compact operator X belongs to the class Q^^, i.e., if the condition 
^ Sk{X) ■ < oo holds. The latter result is precise in some sense. If a compact 

l<fc<oo 

operator X does not belong to the class then there exists a continuous maximal chain 
^ such that the integral (I7.4ip does not exist even in the sense of weak convergence; see 
|Brod4] . Lemma 22.2. In any case, if the operator X is compact and if the integral 
(17.411) exists (in the sense of the convergence of integral sums with respect to the uniform 
operator norm), then this integral represents a Volterra operator. We recall, that a linear 
operator in a Hilbert space is said to be a Volterra operator if it is compact and if its 
spectrum consists of only one point, the point zero. 

The representation (17.401) of an operator A by means of the integral of triangular trunca- 

Integrals of scalar valued functions with respect to operator valued measures and integrals of operator 
valued functions with respect to a scalar valued measure are usually much easier to deal with than integrals 
of operator valued functions with respect to operator valued measures. In the integral (|7.4ip . both the 
function PX and "the measure" dP are operator valued. 

An integral of the form (|7.4ip can be considered (under appropriate parametrisation of the chain *p ) 
as a special case of a double integral operator of the form 

1 1 



We already met such integrals in Section [J] However, here the function % is of a very special form, and 
the results which can be obtained for double operator integrals with this function are much more precise 
than the results which follow from the general theory of double integral operators. 

Recall that singular values of a compact operator X are the eigenvalues of the operator \/ X*X 



indexed in such a way that si{X) > S2{X) > S3{X) > . . . and that X e Si if ^ Sk{X) < oo. 




X{t, s)dP{t)X dP{s) , with x{t, s) ^ 1 for s > t, x(i, s) = for s<t, 







oc 



k=l 
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tion can be considered as a coordinate-free representation of A from its maximal eigenchain 
and its imaginary part. On the one hand, this representation generahzes the results of 
Livsic (see Theorem I" and the Remark following it that focuses attention on the formulas 
07.191) and (17.201) ). On the other hand, the representation (I7.40p is "coordinate free", i.e., 
it represents the operator A itself in the original Hilbert space 7i, rather than a "model" 
operator T that acts in the "model" space and which is only unitarily equivalent to 
the original operator A (or even to an inessential extension A of A acting in a larger 
space TC D H). In spirit, the representation (I7.40p is much closer to the original work 
of Schur |Sch2j than the triangular model (I7.15P of Livsic. The integral representation 
(I7.40p for a bounded linear operator A with imaginary part Ca G ©i was first obtained 
by Brodskii in [Brodl] using the representation f l7.19p -( 17l20l) as a model. Brodskii just 
transformed this representation to the coordinate free form (I7.40p . This proof used the 
theory of characteristic functions. Later, in |Brod2] and |Brod3] . the representation (17.400 
was obtained for arbitrary Volterra operators A in a Hilbert space (in which case A(t) = 
in (I7.40p ). and also for bounded linear operators A with real spectrum and Ca G &uj, in- 
dependently of the theory of characteristic functions, by methods based on consideration 
of the eigenchains of the operators A, i.e., by generalizing the reasoning of Issai Schur. 

The study of the integral of triangular truncation has led to unexpected and deep con- 
nections between the spectra of the real and imaginary components of Volterra operators. 
In certain cases the clarification of these connections has required the development of new 
analytic tools, see Chapter III of the book [GoKr2] . As an example of the application 
of the general results obtained in the setting of the integral of triangular truncation, we 
consider the Volterra operator A = B + iC, B = B*,C = C* in the Hilbert space L'^{[0, 1]) 
that is defined by the equality 



where the function h{-) is periodic: h{t + 1) = h{t), Hermitian: h{—t) = h(t), and 
summable on [0, 1]. It is easily checked that the eigenvalues {^j}!°oo ivA'^oo 
operators B and C (appropriately indexed) are related by the discrete Hilbert transform: 



Consequently, it is possible to obtain estimates for the discrete Hilbert transform by 
applying some results on the spectra of the Hermitian components of Volterra operators 



Thus, the Schur paper |Sch2] . which is elementary and purely algebraic, stimulated the 
creation of several deep and rich analytic theories. 



1 




t 




(7.43) 
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8 . Sequences of multipliers that preserve 

the class of polynomials with only real zeros, 

and entire functions of the Laguerre-Polya-Schur class. 



Problems related to the distribution of the zeros of polynomials have attracted the 
attention of mathematicians for a long time. In particular, the following question has 
generated considerable interest. How many real zeros does a given a polynomial with 
real coefEcients have? There are several methods for either estimating or determining 
precisely the number of zeros of such a polynomial that belong to a given interval (a, b) of 
the real axis. These include the Descartes' rule of signs, the Budan- Fourier algorithm, the 
Sturm algorithm and methods based on Hermitian forms. These methods are presented 
in old books on algebra ( [Webj . Vol.1, [Kurj ) . as well as in books devoted to the zeros 
of polynomials, [Obr] . [Mar] . [Dieuj . The article [KrNa] contains a detailed survey of 
the method of Hermitian forms for the separation of the zeros of polynomials. A lot of 
additional material on the distribution of roots of polynomials can be found in |PoSzj . 
Part V. 

In this section we shall focus on a different class of results that deal with transformations 
that preserve the class of polynomials P{t) = PQ+pit+p2t'^ ■ ■ ■ +Pnt"' with real coefficients 
for which 

i^nr{P) = the number of non real roots of the polynomial P{t) 

is equal to zero. The simplest result of this kind states that if P{t) is a polynomial with 
real coefficients and a G M, then 

#„,(P) = =^ #„.(«P + P') = 0. 

If the roots of P{t) are distinct, then this follows easily from RoUe's theorem applied to 
e-"*P(t). 

THEOREM Let Pit) =pQ+pit+p2t^ ■■■ +Pnt'' andQit) = q^ + qit + q^t^ + ■■■ +qmt"' 
be polynomials with real coefEcients and assume that #nr(Q) = 0. Then the following 
conclusions hold: 

(1) [Hermite] #„,(g(|)P) < #„,(P). 

(2) [Laguerre] If also the roots of Q{t) fall outside the interval [0, n], then 

#nr(Q(0)po + Q(l)Plt + Q(2)a2t'+ ■■■ + QHp„n < #„r(P). 
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(3) [Malo] If also the roots ofQ{t) are either all positive or all negative and I = min(n, m) 
then 

i^nr{P) = ^ #nr(PogO + Pl^l^ + " " " + mA = 0- 

(4) [Schur] If also the roots of Q(t) are either all positive or all negative and I = 
min(n, m), then 

#„,(P) = ^ #nr(PogO + l^-PlQlt + 2!p2g2t' + " " " + Hpiq/) = 0. 



The cited results may be found in |Obrj . |Lagl| ), |Malj and |Sch7] . respectively; see also 



[PoSzj . Part V, Chapter 1, § 5, no. 63 and 67 for the first two. 
The last three three statements of the theorem deal with an operation of the form 

PO+Plt+ ■ ■■ + Prif' ^ 70P0 + llPlt + ■ ■ ■ + InPnt"- (8.1) 

In the particular case considered by Schur, 7^ = qk for k < m, and 7^ = for /c > m, 
where the qk are obtained from the coefficients of a polynomial Q{t) with only negative 
roots that we now write as Q{t) = qo + + + ■ ■ ■ + ^t^- The importance of this 
result is that it admits a converse: Every sequences {7fc}o<fc<oo ^01 which the operation 
(18. ip preserves the class of polynomials with real coefficients and jj^nr{P) = is either 
generated by a polynomial Q{t) = qo + ^t + ^t"^ + ■ ■ ■ + with only negative zeros, or 
belongs to the closure of sequences generated by such polynomials. A full description of 
this class of sequences {7fe}o<fe<oo is presented in the paper [PSJ by G. Polya and I. Schur 
and will now be described briefly below. 

Given an infinite unilateral sequence 

r = {70, 7i, 72 • • ■ Ik, ■■■} (8.2) 

of real numbers, let $(t) denote the (formal) power series 

00 

*W=E1t^' (8-3) 

k=0 

and, for any polynomial 

Pit) =Po+Pit+ ■■■ +Pnt'', (8.4) 
let r[P{t)] denote the new polynomial: 

T[P{t)] = 7oao + 7iaii + 72^2^' + ■ ■ ■ + 7nant" • (8.5) 

DEFINITION I ([EB])- I - The sequence is said to be a sequence of multipliers of the 
first type if for every polynomial P{t) with real coefficients (of arbitrary degree n), 

#„,(p) = o^#„,(r[P]) = o. 
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II. The sequence (\8.2\j is said to be a sequence of multipliers of the second type if for every 
polynomial P{t) with real coefficients (of arbitrary degree n) and only negative zeros, 

#„,(r[P]) = o. 

DEFINITION II (pPS]). I. An entire function ^ is an entire function of the first 
type, if it admits a multiplicative representation of the form 

$(t) = ct'e"* + , (8.6) 

k 

where c ^ is a real number, I is a non-negative integer, a is a non-negative real number, 
and the 6k are non-negative numbers that satisfy the condition J2^k < oo. 

k 

II. An entire function ^ is an entire function of the second type, if it admits a 
multiplicative representation of the form 

= ct' e-^*'+"* n + ^ '^'^) ' (^•'^) 

k 

where c ^ is a real number, I is a non-negative integer, (3 is a non-negative number, a 
is a real number, and the 6k are real numbers that satisfy the condition ^((5fc)^ < oo. 

k 

THEOREM (G. Polya and I. Schur, [PSj ). I. If the sequence 70, 71, ... ,7^, ... is a 

sequence of multipliers of the first (respectively the second) type, then the series (18.31) 
converges in the whole complex plane, and the entire function $(t) which is represented 
by this series is an entire function of the first (respectively the second) type. 

II. /f $(t) is an entire function of the first (respectively the second) type, and (18. 3p is 
its Taylor expansion, then the sequence 70, 7i, • • • , 7fc, • ■ ■ is a sequence of multipliers 
of the first (respectively the second) type. 

This theorem gives a full description of the sequences of multipliers of both the first and 
second type. The appearance of two types of multipliers (and two types of entire functions) 
corresponds to the fact that in the Schur theorem from jSch7j that was stated above, 
the polynomials P(t) and Q{t) appear in a symmetric way: if one of the polynomials 
P(t) = Po+Pit+p2t'^+ ■ ■ ■ or Q(t) = qo+qi+q2t'^+ ■ ■ ■ has only real zeros, and the other 
has only negative zeros, then all the zeros of the polynomial poQ'o + ^^-PiQit + 2!p2Q'2^^ + ■ ■ ■ 
are real. Thus, roughly speaking, sequences of the first (respectively second) type act 
on polynomials that are entire functions of the second (respectively first) type. The two 
types of entire functions arise as limits of the two classes of polynomials: 



THEOREM (E. Laguerre, [Lig2l ; G.Polya, [PHIl] ) 



I. Let $(t) be an entire function of the first type (respectively the second type). Then 
there exists a sequence {$„(t)}„=i,2, ... of polynomials such that the zeros of {$„(t)} lie 
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in the negative half-axis (respectively the real axis) and converges to locally 

uniformly in the whole complex plane. 

II. If a sequence of polynomials {$n(^)}n=i,2, ... converges uniformly in a neighborhood 
of the origin to a function that is not identically equal to zero and if all the zeros of 
every polynomial h^ in the negative half-axis (respectively the real axis), then the 
sequence {$Ti}n=i,2, converges locally uniformly in the whole complex plane and the 
limit function is an entire function of the first (respectively second) type. 

Part I of this theorem was obtained by Laguerre, |Lag2| ; part II was obtained by Polya, 
|PoIlj . Laguerre obtained a weak version of part II. Namely, he assumed that the sequences 
of polynomials {$n} considered above converge in the whole complex plane, not just in 
a neighborhood of the origin, and deduced the same properties of the limiting function 
that are stated in part II of the preceding theorem. This result of Laguerre is not strong 
enough to obtain a description of the multiplier sequences, the stronger result by Polya 
is needed. The theorems of Laguerre and Polya, and some generalizations, can be found 
in [HiWij . Chapter III, §3, and in [Lev], Chapter VIII. 

The paper [PS] by Polya and Schur served as a source of inspiration for the investigations 
of I.J. Schoenberg on the representation of totally positive functions and sequences. The 
notion of total positivity was introduced by Schoenberg in [Scholj . 

DEFINITION III. A real function (or, in other terms, kernel) K{t,s) of two variables 
ranging over linearly ordered sets T and S, respectively, is said to be totally positive if for 
everjcfl m and for every 

ti<t2< 

the inequalities 



hold, where 

I ^1, h, • • • ,tm. 

\ "Si, S2, ... , 



<tm, Si < S2 < ■ ■ ■ < Sm ti G T , tj E S, 



(8.8) 



K\ '^"^ I >0 

Sl, S2, ... , Sm 



(8.9) 



det 



K{ti, si) K{ti, S2) 

K{t2, Si) K{t2, S2) 



K{ti, Sr, 
K{t2, Sr, 



K(tm, Si) K(tm, S2) ■■■ K(tm, Sm) . 



.10) 



Usually T and S are either subintervals of the real axis (that may coincide with the 
full axis), or countable sets of real numbers such as the set of all integers or the set of all 

^■^If both sets T and S are infinite, then m can be an arbitrary natural number; if at least one of the sets 
T or 5 is finite then m can be an arbitrary natural number satisfying the restriction m < min{|T|, |iS|}, 
where \A4\ denotes the cardinality of the set A4. 
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non-negative integers, or even finite sets of integers. If T and S are sets of integers, then 
K can be viewed as a matrix; in this case, K is referred to as a totally positive matrix. 

A concept that is more general than total positivity is sign regularity. 

A function K{t, s) is said to be sign regular if there exists a sequence of numbers Em, 
each, of which is equal to either +1 or —1, such that in the setting of ( 18. 81) . the inequalities 

^^l ti, h, ■ ■ ■ ,tm 

y Si, S2, ... , 

hold. 

Totally positive matrices (and kernels) have very interesting spectral properties that 
were discovered by F.R. Gantmacher and M.G. Krein, |GaKrlj . |GaKr2] . |GaKr3j . All 
the eigenvalues of a totally positive matrix are positive and distincS- Moreover, its 
eigenvectors posses oscillatory properties that are analogous to the oscillatory properties 
of the eigenf unctions of Sturm-Liouville differential equations. A presentation of the 
spectral properties of totally positive matrices and kernels can also be found in the survey 
article |Pink] by A. Pinkus. However, the notion of total positivity was introduced by 
Schoenberg |Schol] in his study of variation-diminishing kernels. Strictly speaking, in 
jScholj . the definitions of total positivity and sign regularity were formulated for the case 
of finite matrices:, generalizations to wider settings were developed later by Schoenberg 
himself and by S. Karlin. (See the book |Karj for the references and for the history.) 

Let V[ Zi, ... , Zi\ denote the number of sign changes of a given sequence [zi, 2:2, ... , zi\ 
of real numbers, when the zero terms are discarded. For example, V[l, 0, 1, 0, —1] = 1 
and V[l, -1, 1, 1, -1, 1] = 3. 

DEFINITION IV. Let K = [kij\ be a p x q matrix with real entries kij, 1 < i < p, 1 < 
j ^ P, Q < 00. The matrix K is said to be variation-diminishing, if for every sequence 
X = [xi, X2, ... , Xq] of real numbers, the sequence 

yi= ^ kijXj , il<i<p), (8.12) 
i<i<g 

enjoys the property 

V[yi, ?/2, ... , Vp] < V[xi, X2, ... , Xg] . (8.13) 

THEOREM (I.J. Schoenberg, |Schol] ). Let K be a p x q matrix with real entries. 

I. If the matrix K is sign-regular (in particular, if K is totally positive), then K is 
variation- diminishing. 

^"^Under the assumption that all its minors are strictly positive 
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II. If the matrix K is variation-diminishing, and 

iankK = q, (8.14) 

then K is sign-regular. 

Under the additional restriction (18.141) . this theorem gives necessary and sufficient condi- 
tions for a p X g matrix K with real entries to be variation-diminishing. A characterization 
of variation-diminishing matrices without any restrictions was obtained by Th. Motzkin 
in his PhD Thesis (Basel, 1934). His thesis was published in 1936, [Motlj : see also |Mot2j . 
Additional characterizations of matrix variation diminishing transforms can also be found 
in Chapter 4, § 8 of |HiWij . and Chapter 5, §§ 1, 2 of [ Karj . The latter is a storehouse of 
wisdom on total positivity, variation diminishing transformations and related issues and 
applications. 

For a function x{t) which is defined on a linearly ordered set T, the number of sign 
changes V[a;(t)] is defined as V[x(t)] = sup V[x(ti), x(t2), • • • , x{ti)\, where the supremum 
is taken over alHi, • • • , ti from T such that ti < t2 < ... < ti- (It is possible that 
V[x{t)] = oo). 

For a real-valued kernel K{t, s) defined for t ^ T, s & S, where T and S are subin- 
tervals (finite or infinite) of the real axis, the variation diminishing property also can be 
formulated in the form 

V[y{t)]<V[x{s)] {teT,seS), (8.15) 

where 

d 

y{t) = j K{t,s)x{s)ds, a<t<b. (8.16) 

c 

Of course some restrictions have to be imposed on the class of functions x{s) and on the 
kernel K{t, s) to ensure the existence of the transformation fl8.16p and the possibility of 
counting uniquely!^ the number of changes of sign of the functions x{s) and y{t). 

The preceding theorem of Schoenberg that characterizes matrices with variation-diminishing 
properties in terms of their sign-regularity, can be extended to continuous kernels K{t, s),. 

It should be mentioned that as early as 1912, in order to estimate the number of real 
zeros of polynomials with real coefficients, M. Fekete considered formal power series with 
real coefficients ^ Qt* that possess the following property: for a given natural number 

0<i<OD 

r, all the determinants h8.18\i with m = 1, 2, ... ,r are non-negative. Power series with 
this property are called r-time positive. Multiplication by such a power series ^ Cjt*, 

0<i<oo 

Since the number of changes of sign of a function is defined pointwise, the functions x{s) and y{t) 
must be defined everywhere, not just almost everywhere on the appropriate intervals. 
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transforms the power series Yl ^kt^ i^^o power series ^ UjP according to the 

0<k<r 0<i<oo 

rule 

0<j<oo ^0<i<oo ^ ^0<fc<oo ^ 

or, equivalently, 

Vj = E ^i-'^^fc ' < j < oo. 

0<k<j 

Fekete formulated the following statement (see footnote number six in |Fekj ): 
Let be an r-time positive (formal) power series, and let Yl ^kt^ ^ poly- 

0<i<oo 0<fc<oo 

nomial of degree r (i.e., Xk = for k > r) with real coefficients. Tlien 
V[yo, 2/1, 2/2, • • • , 2/j, ■■■]< V[xo, xi, ... , Xr]. 



Totally positive matrices and kernels that depend on the difference of their arguments 
are of special interest. Let {cj}_oo<i<oo be an infinite bilateral sequence and let the infinite 
Toeplitz matrix K be defined in terms of this sequence by the rule 



0<p,g<oo ' 



k 



def 



P,Q 



(8.17) 



If the sequence Cj is unilateral: Cj are defined only for i > 0, we first extend the original 
sequence to the set of all integers by setting Cj =^ for i < and then define the Toeplitz 
matrix K by rule 08.171) applied to the extended sequence. 



DEFINITION V ( I.J. Schoenberg, |Scho2j ). Tie real valued infinite sequence {cj}, bilateral 
or unilateral, is said to be totally positive if the matrix ( 18.171) is totally positive, i.e., if for 
every natural m and for every choice of integers pi < P2 < • • • < Pm, Qi < 12 < ■ ■ ■ < Qm 
the inequality 



det 



"^Pl-l?m 



^Vm-qi ^Vm-q2 



c 



> 



holds. 



DEFINITION VI ( I.J. Schoenberg, |Scho2j ). A real valued function A{t) that is defined for 
all t G {—oo, oo) is said to be totally positive if it satisfies the following three conditions: 

i. The kernel K{t, s) =^ A{t — s), — oo < t, s < oo, is totally positive, i.e., for every 
natural number m and for every ti < t2 < ■ ■ ■ < tm, Si < S2 < ■ ■ ■ < Sm the 
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inequality 



det 



A(ti 
A(t2 



A(ti - S2) 
Hh - S2) 



A(t^-si) A(t, 



S2 



A(ti - s„ 

A{t2 - Sr, 
A(tjrt 



> 



(8.19) 



holds. 

ii. Tie function A{t) is measurable. 

iii. Tie function A{t) is positive for at least two distinct values oft. 



□ 



It is not difficult to prove that if a function A{t) is defined on R and is nonnegative 
there, and if the inequahties f l8.19p hold for m = 2 and for all ti < t2, si < S2, then the 
function ■ip{t) = — In A(t) is convex (in the wide sense0) on M. In particular, if a function 
A{t) is totally positive, then the function — InA(t) is convex (in the wide sense) on R. 
Therefore, for every totally positive function A{t) the limits 



a 



lim 



-lnA(t) 



lim 

t— > + CXD 



-lnA(t) 



(8.20) 



exist and — 00 < a < j3 < 00 . The equality a = (3 holds if and only if A(t) is of the form 



—oo<t<oo, for some real constants k and I. 



(8.21) 



(A function of the form f l8.2ip is easily seen to be totally positive since all the determi- 
nants fl8.19p vanish, if m > 2.) Thus, if a function A[t) is totally positive, but not of the 
form Ii8.21\) . then a < (3 and hence the two-sided Laplace transform j A{t)e^^dt exists for 

all points z in the open strip a <Iiez < (3 of the complex z-plane and represents a holo- 
morphic function there. Moreover, the function, represented by this Laplace transform, 
takes strictly positive values for z G {a, (3) C R. Hence, the reciprocal function "^{z): 



^(z 



def 



A{t) e''dt 



a <Rez < f3 , 



(8.22) 



is meromorphic in the strip a < Kez < f3, holomorphic in all points of the interval 
(a, /5) G R, and takes strictly positive finite values in this interval: < \l/(x) < 00, 
a < X < P . 



A function defined on Mis said to be convex in the wide sense if it is convex in the usual sense on 
some subinterval of M, which can coincide with M, can be finite or semi-infinite, and is equal to +00 on 
the complement of this interval. A non-negative function A(t) on M is convex in the wide sense if and 
only if the inequalities (|8.19p hold for m = 2 and for all ti < t2, si < S2- 
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THEOREM I (I. J. Schoenberg, Theorem 1 in |Scho2] ). 

I. Let A{t) be a totally positive function that is not of the form f l8.2ip and let the function 
^(z) be defined by means of (18.221) as a meromorphic function in the strip a <Rez < j3 
(see (Km ). 

Then "^{z) is holomorphic in this strip and admits an analytic continuation to the whole 
complex plane C. The continued function (denoted by "^{z) as well) is an entire function 
of the second type^which is not of the form 

"^{z) = ce"^, where a and c are real constants, c 7^ . (8.23) 

The function A(t) can be recovered from "^{z) by mean of the inversion formula 

7+ioo 

Mt) = 7^ [ -^e-^'dz, -oo<t<oo, (8.24) 

7— ioo 

where 7 is an arbitrary^ real number from (a, /3). 

II. Let "^{z) be an entire function of the second type that is not of the form (18.231) . and 
let \l/(x) be strictly positive on an interval {a, (3) € M (so that the reciprocal function 

— -— is holomorphic in the vertical strip a < lie z < 3). Let the function A(t) be defined 

from this '^{z) by means of the iiitegraJ.Pl Then the function A(t) is totally positive, and 
if the interval {a, (3) is the maximal interval on which the function \E'(x) is positivex'^hen 
the endpoints a and (3 of this interval coincide with the limits a and (3 in (18.201) . 

The proof of this theorem is based essentially on methods and results from the paper 
[PS] . Indeed, the names of Polya and Schur (and Laguerre) appear in the title of |Scho2j . 
and as Schoenberg himself writes "A proof of Theorem 1 is essentially based on the results 
and methods developed by Polya and Schur. The only additional element required is a set 
of sufficient conditions insuring that a linear transformation be variation diminishing. " 

For the sake of added perspective, we shall sketch the proof of part II, which is not 
difficult (once the theorem has been formulated), but shall omit the proof of part I, which 
is not so simple and straightforward. If the function ^'(z) is "a linear factor", i.e., if 
■q)(z) = (1 + Sz), 5^0 when 1 + ^7 > and ^(2) = -(1 + 6z), 6^0 when 1 + ^7 < 0, 



In the sense of the paper [PS] , see Definition II above in this section. 

The value of the integral in ()8.24p does not depend on the choice of 7 G [a, (3). 
^^The integral (|8.24|) converges absolutely for every entire function ^'(z) of the second type, except 
when '^{z) is of the form ce°*(l + 5z), where c ^ 0, 5 0, c, S,a are real, in which case the integral (|8.24p 
converges in the sense of principal values. 

That is, if either '^{a) = 0, or a = — cx), and if either ^'(/3) = 0, or /3 = cx) . 
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then 



(8.25) 

This function A(t) is totally positive. If the formula (18.241) is used to construct the function 
Aj(t) from "^jiz), j = 1, 2, and the function A{t) from the product "^{z) = \E'i(z)\l/2(2;), 
and if the same 7 is used for all three constructions, then 

00 

A(t)= J A,{t-OMOd^- (8.26) 
—00 

Moreover, if Ai and A2 are totally positive, then the function A is totally positive as 
well. Therefore, if "^{z) = ± Yl (1 + ^kz) is a polynomial with real roots, then the 

l<fe<n 

function A{t) defined by (18.241) is totally positive. Finally, if "^{z) is an entire function 
of the second type, then there exists a sequence of polynomials "^niz) with real roots 
such that \E'ri,(-2) — > "^{z) and, correspondingly, A„(t) A{t). Therefore, if "^{z) is an 
entire function of the second type, then the corresponding function A{t) that is defined 
by formula (I8.24p is totally positive. Thus, part II of the theorem is proved. 

The statement that the difference kernel K{t, r) = A{y — r) is variation diminisfiing 
if and only if the function A{t) is of the form A8.24\) . where "^{z) is an entire function of 
the second type, was formulated explicitly in |Scho3j . In particular, a difference kernel 
K is variation diminishing if and only if either the kernel K or the kernel —K is totally 
positive.!^ Many results related to totally positive and variation diminishing difference 
kernels can be found in |HiWij . Chapter IV, and especially in |Karj . Chapter 7. 

Discrete totally positive difference kernels were first considered in |AESWj , |ASWj and 
[Edr j . The formulations are analogous to the formulations for continuous difference ker- 
nels, but the proofs are more difficult and use tools from value distribution theory for 
meromorphic functions. 

THEOREM(A.Aissen,A.Edrei,I.J.Schoenberg,A.Whitney, [AESWj . [XSWj . [Edr]). 

I. Let {sfc}o<fc<oo be a totally positive (unilateral) sequence with sq = 1. Then the series 

F{z)= ^kz" (8.27) 

0<A;<oo 

This agrees with the results of Schoenberg and Motzkin on general variation diminishing transforms 
since a difference kernels K{t, r) ~ A{t — r) is sign regular if and only if either the kernel K{t, r) or the 
kernel —K{t, r) is totally positive. 



76 



converges in a neighborhood of the origin to a function of the form 

= e^^^^^— ^ («fc>0, /3fc>0, 7>0, 5^K + /3fc)<oo). (8.28) 

k 

II. Let F{z) be a function of the form (18.281) and let (18.271) be its Taylor expansion in 
the vicinity of the origin. Then the sequence {sfc}o<A:<oo is totally positive. 

This theorem provides a parametrization of the set of all totally positive unilateral 
sequences (under the normalizing condition sq = 1). The sequences ak, Pk and the number 
7 serve as independent parameters. Various results related to unilateral and bilateral 
totally positive sequences can be found in |Karj . Chapter 8. 

This parametrization of totally positive sequences plays an essential role in the theory 
of representations of the infinite symmetric group. It appears in the description of non- 
decomposable positive definite functions. This was discovered by Elmar Thoma in |Thoj . 
where some earlier results on totally positive sequences were rediscovered. The generating 
function of the sequences that appear there are of the form (18.281) . with 7 = 0, and 
Yl'^k + Ylf^k < 1- The theory of totally positive functions and sequences is used in 

k k 

approximation theory, mathematical statistics and in other fields. References can be 
found in [Kar] and in [GaMicj . Recently, a surprising connection between total positivity 
and canonical bases for quantum groups was discovered by G. Lusztig; see |FoZej . 

The methods and especially the ideology of the paper [PS] underly some of the work 
of B.Ya. Levin that is considered in Chapter IX of his monograph ^Levj . The famous 
S.N. Bernstein inequality can be formulated in the following form: Let f{z) be an en- 
tire function of exponential type ctj. If the inequality |/(a;)| < \e''^\ holds for all real 
X, —00 < X < 00 and if cr/ < a, then the derivative f'{x) satisfies the inequality 
\f'ix)\ < |(e'^^)'| (-00 < X < 00). 

In other words, the operator ^ preserves inequalities on the real axis for some classes 
of entire functions. Levin has investigated the general form of linear operators which 
preserve inequalities of this sort. In this investigation, the linear operators that preserve 
the class of entire functions that is obtained as the closure of polynomials with zeros in 
the open right half plane play a crucial role. The operators of the form (18. ip . which were 
introduced and investigated in the paper |PSj , are precisely those that commute with the 
operator z-^. 
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9 . The Schur class of holomorphic functions, 
and the Schur algorithm. 

The papers |Sch9] - pchlO] are probably the best known contributions of Issai Schur 
to analysis. In these papers Schur introduced a new parametrization of functions that 
are holomorphic and bounded by one in the open unit disk D and an algorithm for 
calculating these parameters. These ideas and their subsequent generalizations to matrix 
and operator valued functions are widely used in a variety of applications that range from 
signal processing to the study of Pisot and Salem numbers. 
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DEFINITION 1. A function s{z) that is holoniorphic in the open unit disk D and satisfies 
the inequality 

\s{z)\ < 1 for all points zeB (9.1) 

is said to belong to the Schur class S. A function s G 5 will be referred to as a Schur 
function. A Schur function s G 5 is said to be inner if the absolute value of the radial limit 

s{t) = lim s{rt) (9.2) 

r— >1— 

is equal to one a.e. with respect to Lebesgue measure. The set of inner functions will be 
denoted by the symbol Sin- The set of rational inner functions will be denoted Srin- 

The simplest inner functions are finite Blaschke products: 



k=l 



where c is a constant of modulus one, k is a non-negative integer and ak G D. 



The Schur algorithm was introduced by I. Schur in Section 1 of }Sch9] . It exploits the 
fact that if 7 G D, then the linear fractional transformation 

1 - C7 

is a one to one mapping of the open unit disk D onto itself and a one to one mapping of 
the unit circle T, i.e., the boundary of D, onto itself. If I7I = 1 the transformation (19.41) 
maps the set C \ {7} into the point {—7} and is not defined at the point 7. 

Let / G 5 be a Schur function that is not a constant of modulus one. Then |/(0)| < 1 
and hence, in view of the properties of (19.41) . the transformation 

i-f(z)m 

maps S into {s G 5 : s(0) = 0}. Therefore, by the Schwarz Lemma, the transformation 

/M^ .i ,9.6) 

l-/(2)/(0) 2 

maps {s G iS : |s(0)| 7^ 1} onto the class S of all Schur functions. In particular if 
f & S \Srin, then |/(0)| < 1 and the transformation (19.61) is well defined. It is easy to see 
that: 

PROPOSITION 1. The transformation (tE^) maps / G 5 \ S^n into itself 
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PROPOSITION 2. The transformation ( 19.61 maps rational inner functions f G Srin of 
degree n, n > 1, into rational inner functions of degree n — 1. 

DESCRIPTION OF THE SCHUR ALGORITHM. The Schur algorithm defines a sequence of 
Schur functions {sk{z)}Q<k<oo starting from a given Schur function s{z) that is assigned 
the index zero: 

Soiz) = siz), (9.7) 



Sk{z) = ■ - (A; = 1, 2, 3,...). (9.^ 

1 - Sfc_i(0) z 



SCHUR PARAMETERS. Let s e S and let {sk} be the sequence (finite or infinite) of 
functions generated by the Schur algorithm with Sq{z) = s{z). The numbers 

lk = SkiO) (9.9) 
are termed the Schur parameters of the function s. 

If the starting function s ^ Srm, then, by Proposition 1, the algorithm continues in- 
definitely and produces infinitely many Schur functions Sk{z), k = 0, 1, 2, 3, . . . and 
generates an infinite sequence of Schur parameters {7fc}o<fc<oo- In this case 

|7fc|<l, A; = 0, 1, 2, ... . (9.10) 

If the starting function s G Srm is is a rational inner function of degree n, then, by 
Proposition 2, the algorithm terminates after n steps. In this case, it generates a finite 
sequence of Schur parameters 

|7fc|<l, /c = 0, 1, 2, . . . , n - 1 and |7„| = 1. (9.11) 



The Schur parameter 7fc(s) of a Schur function 

oo 

s{z) = J2ck{s)z'' (9.12) 

fc=0 

depends only on the Taylor coefficients co(s),ci(s), . . . , Cfc(s) of the function s: 

7fc(s) = $fc(co(s), ci(s), . . . , Cfc(s)) , (9.13) 

where $fc(co, Ci, . . . , c^) is a rational function of the variables Cq, Cq, Ci, cT, . . . , c^-i, CfcTi, . 

Conversely, the Taylor coefficient Ck{s) of a Schur function s depends only on the Schur 
parameters 7o(s), 7i(s), . . . , 7fc(s) of this function: 

Ck{s) = *fe(7o(s), 7i(s), . . . , lk{s)) , (9.14) 
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where ^fc(7o, 7i, • • • , Ik) is a polynomial in 70, 70, 71, 7i, • • • , Ik-i, 1k-i, Ik ■ 

Explicit expressions for $fc and \E'fc are given in |Scli9j . 

DEFINITION 4. A sequence {7fc}o<A,<oo of complex numbers is said to be strictly contrac- 
tive if I 7a; I < 1 for every k. 

Thus, the sequence of Schur parameters of a Schur function s G S\Srin is strictly contrac- 
tive. Moreover, every preassigned strictly contractive sequence {70, 71, 72, . . . , 7^, . . . } 
is the sequence of Schur parameters for some unique Schur function s E S \ Srm- Such a 
function can be constructed by means of a continued fraction algorithm. 

SCHUR CONTINUED FRACTIONS. Given an arbitrary strictly contractive sequence 
{70, 7i, ... , 7fc, . . . } of complex numbers, one can construct a sequence of rational Schur 
functions which converge to a Schur function s with Schur parameters {7o(s), 71 (s), ... , 
7fc(s), . . . } that coincide with the preassigned sequence. The construction is based on the 
inverse of the transformation 

i.e., on the transformation 

which also maps S into S. We use the 'inverse Schur algorithm' recursively to construct 
the n-tli Scliur approximant, which (following Schur) we will denote by [z; 70, 71, ... , 7n]- 
Namely, we write 

[Z; 7n] = In, 

r ^ lk + Z-[Z; 7fc + i, 7fc+2, • • • , 7n] /Q T 

[Z; 7k, lk+1, 7fe+2, • • • , 7n = , , _ -r T , 7 J 

l + 'jk- z-[z; 7fc+i, 7fc+2, ... ,7nJ 

k = n — 1, n — 2, . . . ,1, . 

The function [z; 70, 71, ... ,7„] is a rational Schur function whose Schur parameters 
Ik ( [z] 7o, 7i, ■ • ■ , 7n] ) are equal to 

7fc( [z; 7o, 7i, • • • , 7n] ) = 7fc for = 0, 1, . . . , n; 
7fc ( 7o, 7i, • • • , 7n] ) = for k> n. 



Let ni and n2 be two nonnegative integers. Since the Schur parameters with index 
k : < k < min(ni, 712) for the functions [z; 70, 71, ... ,7„J and [z; 70, 71, ... ,7n2] 
coincide, the Taylor coefficients and (0 < A; < min(ni, ^2)) for these two functions 
coincide as well. Hence, 

[z; 7o, 7i, • • • , 7ni] - [z; 7o, 7i, • • • , 7n2] = ^^k- 4) • 

min{ni,n2)<fc<oo 
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Using the estimates | [z; 70, 71, ... , 7„J | < 1, | [z; 70, 71, ... , 7„2] | < 1 for 2; G D and the 
Schwarz Lemma, we obtain the inequahty 

\[z- 70, 71, • • • ,7nJ - 70, 71, • • • ,7n.]| < 2 \z\ for ^ 6 © . (9.18) 

From (19.181) it follows that the limit 

[z] 70, 71, . . . , 7fc, . . . ] = lim [z] 70, 71, ... , 7n] (9.19) 

exists in ©. The function [z; 70, 71, ... , 7^, . . . ] is said to be the Schur continued fraction 
constructed from the sequence {70, 71, ... , 7^, . . . }. 

The function [2; 70, 71, . . . , 7/,, . . . ] G S\Srin- Its Schur parameters 7fc([^; 70, 7i, • • • , 7fc, 
coincide with the numbers 7^: 

7fc([^; 7o, 7i, ■•■ ,7fc, •••])= 7fc, < < 00. 

Given s G S\Srin, we can form the sequence {7o(s), 71 (s), ... ,7^(5), ... } of its Schur 
parameters and then construct the Schur continued fraction [z; 7o(s), 7i(s), ... , 7fc(s), . . . ] 
The function represented by this fraction is a Schur function whose Schur parameters co- 
incide with the sequence of Schur parameters of the original function s. Hence,the Taylor 
coefficients of these two functions coincide as well. Thus, we are led to following result: 

THEOREM (I. Schur, [Sdl9] l 

I. Every s E S \ Srin admits the continued fraction expansion 

s(2;) = [2;;7o(s),7i(s),...,7fc(s), ...]. (9.20) 

II. A Scliur function s G Srin of degree n admits the representation 

s(2;) = [z;7o(s),7i(s), ...,7„(s)]. (9.21) 



DEFINITION 5. Let s G iS \ Srin, ^^t n be a non-negative integer and let ( 19.20)) be the 
Schur continued fraction expansion of the function s. Then the function 

p„(s; z) = [z; 7o(s), 7i(s), ... , 7„(s) ] (9.22) 

is said to be the n-th Schur approximant of the function s. 

REMARK 1. The n-th Schur approximant is a rational function of z whose numerator and 
denominator are polynomials of degree not greater than n. In fact the n-th Schur approx- 
imant of a s E S \ Srin is the n-th convergent of its Schur continued fraction expansion 
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The estimate 

-p„(s; 2)1 < 2|2r+\ (9.23) 

which follows from the Schwarz Lemma, holds for every s G 5 \ Srin and implies that the 
sequence of the Schur approximants of such an s converges to it locally uniformly in the 
open unit disk D. This result on the locally uniform convergence of the approximants 
p„(s; z) to s{z) in the open unit disc D appears in [Sch9] (with the rougher estimate 
\s{z) — Pn(s; z)\ < 2|2;|"+^(1 — |2;|)^^). The problem of convergence of Schur approximants 
to s on the unit circle T is much more difficult. This problem was studied in |Nja| and 
|Khru2j . 

The preceding results imply that the correspondence 

{7o, 7i, • • • , 7fc, • • • } < > [z; 7o, 7i, • • • , 7fc, • • • ] (9-24) 

is a free parametrization of the class of all s & S \ Srm by means of the set of all strictly 
contractive sequences {70, 71, 72, ■ ■ ■ , Ik, ■ ■ ■}■ where sequences serve as free parameters 
of this class, This is important because the geometry of the set of all Taylor coefficients 
of functions of this class is rather complicated, whereas the geometry of the set of all 
Schur parameters is very simple: it is just the direct product of the open unit disks. This 
geometry is compatible with probabilistic structures and is well suited for probabilistic 
study. Some results on Schur functions with random Schur parameters are obtained in 
[Kits] . 

REMARK 2. It is not easy to express the properties of a concrete Schur function s in 
terms of its Schur parameter^^ Ikis). In particular, it is not easy to recognize whether 
the function s is inner or not. Not much is known about this. 

Yl \lk{s)\ < 00, then the function s is continuous in the closed unit disk D, and 

0<k<oo 

max \ s{z)\ < 1. (Of course, s is not inner.) This result was obtained by I. Schur, [SchlOj . 
il5. Theorem XVIII. 

X] l7fc(s)P < C)0, then again the function s is not inner, as follows from the identity 

0<k<oo 

n (l-|7fc(^)r)=exp{ I ln(l-|.(t)|2)m(rft)}. 

0<fc<CXD rj, 

(See |Boy| and also formula (8.14) in |Ger3] . which expresses a similar result for polyno- 
mials that are orthogonal on T.) 



However, to express the properties of a Schur function in terms of its Taylor coefficients is, as a rule, 
even more difficult. 
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If lim |7a;(s)| = 1, then the function s is inner. An equivalent result was obtained by 

k^oo 

E.A. Rakhmanov in the setting of orthogonal polynomials on the unit circle, |Rakhj . It 
is known as Rakhnianov's Lemma. A simple function-theoretic proof of the Rakhmanov 
lemma in the setting of Schur functions can be found in |Kats] . 

If the sequence of Schur parameters {7fc(s)}o<fc<oo satisfies the Mate-Nevai condition 
lim 'jk'fk+n = for n = 1, 2, 3, . . . , but lim |7fc| > 0, then s is an inner function. This is 

fc— >oo fc— >oo 

Theorem 5 and Corollary 9.1 in |Kliru2] . 

It is also known that there exists infinite Blaschke product s such that \lk{s)\^ < 

0<fc<oo 

OO for every p > 2. {This is shown in |Kliru3] .) 

The following question is both natural and important: 

QUESTION 1: Given a sequence c = {cq, Ci, . . .}, does there exist a function s E S such 
that 

'—f^ = c, for J = 0,1,...? 

Schur obtained an answer to this question by using the algorithm (19.71) - (19.81) starting 
with 

oo 

so{z) = f{z) = J2cjZ^, (9.25) 
i=o 

to calculate the parameters 7j. All the series are formal. However, since the Schur param- 
eters 7o, . . . , 7fc only depend upon cq, . . . , Ck, this does not present a problem. Proceeding 
this way, Schur obtained the following answer to Question 1. 

In order for the series ( 19.251) to be the Taylor series of a Schur function, it is necessary 
and sufEcient that either |7fc(/)| < 1 for every integer k : 0</c<cxd, or |7fc(/)|<l for 
k : < k < n, and |7n(/)| = I. In the second case the coefficients Ck of the series (19.251) 
coincide with the k-th Taylor coefficients of the function \z] 7o(/), 7i(/); • • • 7n(/)] for 
everyo k : < k < oo. 



Schur also considered the following related question: 

QUESTION 2. Given a finite set of complex numbers {cq, . . . ,Cm}, does there exist a 
function s G 5 such that 

^ =cj for j = 0,l,...,m? (9.26) 



Moreover, if such functions exist, how can one describe them? 



^^For k — 0, 1, . . . , n this coincidence holds automatically since the Schur parameters 
7o(/), 7i(/), ■ ■ • , ln{f) are built from cq, ci, ... , c„; the remaining coefficients Cfc for k > n are de- 
termined by Co , ci , ... , c„ . 
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Schur answered Question 2 in terms of the Schur parameters generated by the algorithm 
fCT - (I93D starting with 



so{z) =giz) = ^CjZ^ 

j=0 



(9.27) 



THEOREM 1 (I. Schur, |Sch9j ). There exists a function s & S that meets the interpolation 
condition fl9.26p if and only if: either \'yk{g)\ < 1 for every integer k : < k < m, or 
\'Jk{g)\ < 1 for k : < k < n, and |7n(fl')| = 1 for some n, n < m. In the second case, the 
interpolating Schur function s{z) is unique, namely, s{z) = [z; 70(5')), 71(^7), • • • 7nig)]- 
In the first case, there are infinitely many interpolating functions s{z). Moreover, the first 
m + 1 Schur parameters 7a;(s), A; = 0, 1, ... , m, of every interpolant s coincide with the 
Schur parameters '~fk{g) of the given polynomial. The remaining parameters 7fc(s) with 
k > m are either an infinite sequence of arbitrary strictly contractive complex numbers 
if s e iS \ Srin, or a finite sequence of strictly contractive numbers that terminates with 
\ln{s) \ = 1 for some m < n. Moreover, all interpolants are obtained this way. 

Schur also formulated another criterion for the solvability of the interpolation problem 
(19.261) in terms of an (m + 1) x (m + 1) upper triangular Toeplitz matrix based on the the 
coefficients of the given polynomial: 
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Co 
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Co 



(9.28) 



THEOREM 2 (I. Schur, |Sch9] ). The given polynomial g{z) = cq + ciz + ■ ■ ■ + CmZ"^ can 
be interpolated by a function s ^ S if and only the Hermitian form based on the matrix 
I - C^Cm is non-negative: 



m+1 



(9.29) 



where I is the identity matrix in C™"^^ and ( , ) is the standard scalar product in C™''"^ . 
The Hermitian form is strictly positive if and only if there exist more than one interpo- 
lating function s E S. 

We remark that the matrix Cm and an analogous matrix Dm based on the coefficients 
of the reflected polynomial z"^g{l/z) = c^ + c^-iz + • • • + Cq-z™" figure in the well known 
Schur-Cohn test: 



The roots of the polynomial g{z) lie in D if and only if 0*^0^ — Q^Cm > 0. 
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A nice proof of this result based on Schur parameters may be found in the first chapter 
of jFoFYj . 



Schur derived the criterion f l9.29p for the solvabihty of the interpolation problem (19.261) 
from the criterion for solvability in terms of Schur parameters (that was formulated as 
Theorem 1). As a by product of this derivation, he obtained a formula for the factorization 
of a 2 X 2 square block matrix 

" A B 
C D 



M 



with square block diagonal entries A and C (not necessarily of the same size) when the 
matrix A is invertible, which in turn leads easily to the identity 



" A 


B ' 




I 


" 




C 


D 




CA~^ 


/ 





A 

D - CA-^B 



I A-^B 
/ 



(9.30) 



The matrix D — CA^^B is termed the Schur complement of the block entry A with respect 
to M. If it is also invertible, then M is invertible and the last formula leads easily to a 
formula for the inverse matrix M~^, since each of the three factors on the right hand side 
of (I9.30p are easily inverted. The same formula implies that 

det M = det A ■ det{D - CA'^B), 

as was also noted in |Sch9j . Schur complements are widely used in the applied linear 
algebra and operator theory; see e.g., the long survey (of more then hundred pages) (Ouej 
and [Smj . respectively. 



The k-th step 



of the Schur algorithm can also be presented in the form 







1 





-lk-1 



-lk-1 z' 

1 



Sk-l{Z) 

1 



1 



-7fc_i Sk~i{z) + 1 



where 7^-1 = Sk-i{0). Thus, it is natural to associate the matrix 

-Ik-i 

with the k-th step of Schur algorithm. However, it turns out to be more fruitful to deal 



.-1 



-Ik-i z 
1 



-1 



with the matrix m^^_^, where 



-7 



-7 ■ z' 
1 



-1 1 



'I-I7I 



when I7I < 1. 



(9.31) 



The matrix is also a matrix of coefficients of the linear fractional transformation fl9.15p . 
which is the basic step (19. 6 p of the Schur algorithm, whereas the matrix 



my[z) 



z ■ 7 



7 
1 



|7| 



(9.32) 
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is the coefficient matrix of the hnear fractional transformation fl9.16p corresponding to 
the basic step of the inverse Schur algorithm. 



The coefficient matrix of a linear fractional transformation is determined up to a nonzero 
scalar factor. The matrix of the linear fractional transformation f lQ.lSp is chosen to be of 
the form (19.311) because then is j - inner with respect to the signature matrix 



-1 
1 



(9.33) 



I.e., 



(m, {zr)-'j{m,{z))-'-j = {l-\z\') 



1 




[1 0] 



(9.34) 



DEFINITION. Let uJm = {7fc}o<fc<m be a strictly contractive sequence of complex numbers 
and let the entries of the matrix valued function 

M^^{z)=mj^{z)-m^^_^{z)- ... ■m^^{z)-m^^{z), m = 0, 1, 2, . . . (9.35) 

be denoted as 



M,^,iz) 



(9.36) 



For s{z) G iS \ Srim let u = {7fe}o<fe<oo be the sequence of its Schur parameters and let 
{sk{z)}o<k<oo be the sequence of Schur functions generated by the Schur algorithm (so 
that 7fc = Sfc(O)). Then 



M,.,^(z) 









1 




1 



{Cu;miz)s{z)+d^^{z)) 



and hence 



Moreover, 



au>^{z)s{z) + b^^{z) 



s{z) 



z) s{z) + d^^{z) 

Wn{z) Sm+iiz) +Wi2iz) 



where the matrix W{z) 



W2l{z) Sm+l{z) +W22{Z 
Wu{z) Wi2iz) 



(9.37) 



(9.38) 



(9.39) 



W2l{z) W22{z) 



M^^ {z) can be expressed in the term 



of the entries of the matrix M^^ (z): 



W{z) 



-c^^{z) a^,^{z) 



1 



au;^{z)d^^{z) - h^^{z)c^^{z) 



(9.40) 
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Furthermore, since 



W{z) = m-liz) ■ m-liz) ■ ... ■ m-lJz) ■ m-^(z) , (9.41) 

and the matrix is hnear with respect to z, the entries of the matrix W{z) are poly- 
nomials with respect to z of degree m (or less). In view of fl9.34p . the matrix function 
W{z) satisfies the condition 

W*{z)jW{z) - J > for zeB, (9.42) 

W*{t)jW{t) - J = for teT. (9.43) 

Formula (19.391) (in other notation) appears in § 14 of jSchlO] . It expresses the Schur 
function s{z) with Schur parameters 7fc(s) that satisfy the condition |7fc(s)| < 1, k = 
0, 1, . . . , n, as a linear fractional transformation of the function Sn+i{z). It is important 
to note that the matrix W{z) of this fractional- hnear transformation can be constructed 
from only the hrst n + 1 Schur parameters 7^(5), A; = 0, 1, . . . , n . 

The Schur parameters of the functions s{z) and Sm+i{z) are related: 7fc(sm+i) = 
■jrn+k{s) , A; = 0, 1,2, .... Thus, the following result holds: 

THEOREM 3. The set of all functions s{z) G S, whose Schur parameters 7a;(s) coincide 
with a prescribed set of numbers 7^, |7fc|<l, k = 0, 1, ■ ■ ■ , m , can be parametrized by 
means of the linear fractional transformation 

.(z) = ""ii"ii+""it (9.44) 

W2liz)iu{z)+W22iz) 

where the coefficient matrix W{z) of this transformation can be constructed from only 
these 7fc, and the free parameter u!{z) is an arbitrary function from the class S. 

Schur did not mention either formula fl9.39p or formula (19.441) explicitly in his description 
of the solutions of the interpolation problem of the form 

7fc(s) = 7fc, k = 0, 1, . . . , m. (9.45) 

Nor do the properties (19.421) and (19.431) of the matrix W{z) in (19.441) appear in Schur's 
work. But this was the starting point of subsequent research on interpolation problems 
with constraints in various classes of analytic functions, particularly that of M. Riesz 
and R. Nevanlinna. The work |Sch9j - [SchlO] stimulated interest in obtaining matrices 
of linear fractional transformations that appear in descriptions of the sets of solutions 
of such problems. The methods described above are recursive and depend essentially 
upon formulas involving the Schur parameters. V.P. Potapov showed how to obtain an 
expression for the matrix W{z) that appears in the description of the set of all solutions 
of the problem (19.451) in the class S directly in terms of the data cq, ci, ... , Cm, without 
first calculating the Schur parameters. This method of V.P. Potapov, as applied to the the 
interpolation problem (I9.26p . is elaborated on in great detail in the monograph |DFKj . 
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Considerations related to formula f l9.39p were used by Schur to obtain the following 
result: 

In order that the function s{z) = ^ ^ G §, where p{z) andq{z) are coprinie polynomials, 

be representable in the form s{z) = [z; •jo, •ji, . . . , 7^], with m < 00, it is necessary and 
sufficient that the following two conditions are satisfied: 

1. The polynomial q{z) does not vanish in the closed unit disc D; 

2. The factorization identity \q{t)\'^ — \p{t)\'^ = r, where r is a positive number, holds 
for t e T. 

The circle of problems related to Schur functions and the Schur algorithm is closely 
related to the theory of polynomials that are orthogonal on the unit circle. Note that 

1-C 

is one-to-one mapping of the unit disk {( : \(\ < 1} onto the right half-plane {( : R.e( > 
0}. Therefore, if s{z) is a Schur function, then the function 

1 + zs(z) , , , 

is a Caratheodory function, i.e., a function which is holomorphic and has non- negative 
real part in the unit disk: 

R(iw{z) > for z e ©. (9.47) 
The factor z in f l9.46p leads to the normalization 

w(0) = l. (9.48) 

Conversely, if w{z) is a Caratheodory function that satisfies the normalization condition 
(19.481) . then it can be uniquely represented in the form (I9.46p . where s{z) is a Schur 
function. Every Caratheodory function 'w{z) which satisfies the normalization condition 
(19.481) admits the Herglotz representation 

w{z) = J a{dt) , (9.49) 

T 

where a is a probability measure on T. Conversely, if a is a probability measure on 
T, then formula (I9.49P defines a normalized Caratheodory functiom w{z). Thus, the 
transformation (I9.46P together with the representation (I9.49p . establishes a one-to-one 
correspondence between Schur functions and probability measures on T. It is easy to see 
that 

S G Srin <^=^ cr has finite support <^=^ w{z) is rational with all its poles on T. 
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Let cr be a probability measure on T with infinitely many points of support and let 
{</2fe}o<fc<oo be a sequence of polynomials that is orthonormal, with respect to cr. Such 
a sequence can be obtained by applying the Gram-Schmidt orthogonalization procedure 
to the sequence {2;'^}o<fc<oo- Let ^pKz) = z''(pk{l/z) denote the reciprocal polynomial. It 
turns out that the system of polynomials {(pk, V^fc}o<fc<oo satisfy linear recurrence relations 
that can be written in the form: 
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< k < oo , 



with the initial condition 







" 1 " 






1 



(9.50) 



(9.51) 



Here {aA;}o<fc<oo is a strictly contractive sequence of complex numbers that is determined 
uniquely by the probability measure a that generates the sequence {fk}o<k<oo of orthogo- 
nal polynomials. The numbers afc(cr) are termed the reflection coefEcients of the measure 
a or of the sequence {(pk} of orthogonal polynomials. 

THEOREM [Ya.L. Geronimus]. Let s E S \ Srm Sind let the normalized Caratheodory 
function w{z) be related to s by (19.461) . Then the Schur parameters of s coincide with 
the reflection coefficients of the measure sigma: 

7fc(s) = aki^cr) for < k < oo . 



This theorem first appears in [Gerlj . It also appears as Theorem 18.2 in [Ger2j . Unfor- 
tunately, these papers are not easily accessible. However, an English translation of the 
second one is available. A simplified presentation of the cited theorem by Geronimus can 
be found in |Khrul] and in [PiNe] . 

Many (but not all) properties of a Schur function s{z) can be naturally reformulated 
in terms of the related function w{z), i.e., in terms of the related sequence of orthogonal 
polynomials. In particular: a Schur function s{z) is inner if and only if the related 
measure a{dt) is singular. Indeed, | s{t)\ = 1 if and only if Rew(t) = 0. On the other 
hand, Rew(t) = o"'(t) for m almost every t G T. (Here s{t) and w{t) are the boundary 
values of the appropriate functions and cr'(t) is the derivative of the measure a with 
respect to the normalized Lebesgue measure m.) Some other connections between Schur 
functions and orthogonal polynomials can be found in [Gollj . |Gol2] and |Khru2] . 



There is a rich literature dedicated to the Schur algorithm and related topics. The 
volume |Ausg| containes a selection of basic early papers on Schur analysis written in 
German (by G. Herglotz, I. Schur, G. Pick, R. Nevanlinna, H. Weyl), as well as the After- 
word written by B. Fritzsche and B. Kirstein, the editors of this volume. See also their 
survey [FrKij . The last several years have witnessed an explosion of interest in generaliza- 
tions of the Schur algorithm and the associated parametrization to matrix and operator 
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valued functions. In particular, a p x q matrix valued function s{z) that is holomorphic 
in D and satisfies the inequality 

\C,*s{z)ri\ < 1 for every point z G © and every pair of unit vectors ^ G and t] ^ 

is said to belong to the Schur class 5^^^. The Schur algorithm is developed in detail for 
this class of functions in |DeDy| ; see also |AlDy| for a reproducing kernel Hilbert space 
interpretation and additional generalizations to Pontryagin spaces and the references to 
both of these papers. There are also many papers by the team P. Delsarte, Y. Genin and 
Y. Kamp that are devoted to generalizations of a number of the the themes discussed in 
this section; see e.g., |DeGeKj for a start. The books |Alp| , |Conj . [DFK] and [FoFrj are 
dedicated to function theoretic questions related to the Schur algorithm and its applica- 
tions to operator theory. Applications of the Schur algorithm to Pisot and Salem numbers 
are considered in the book |BDGPSj . The Schur algorithm is also useful in the setting 
of fast numerical algorithms for systems of linear equations with structured matricescj 
(Toeplitz, Hankel, etc.); see the book |S:Methj (and in particular the papers [Kailj and 
[LevKaij ) . The terminology "I. Schur methods in signal processing" is now widely used in 
this connection. See the survey |KaSalj and the volume |KaSa2j for further references in 
this direction. 
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